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ABSTRACT 


A computer may gather a lot of information from its environment 
in an optical or graphical manner. 

A scene, as seen for instance from a TV camera or a picture, can 
be transformed into a symbolic description of points and lines or surfaces. 
This thesis describes several programs, written in the language CONVERT, 
for the analysis of such descriptions in order to recognize, differentiate 
and identify desired objects or classes of objects in the scene. Examples 
are given in each case. 

Although the recognition might be in terms of projections of 2-dim 
and 3-dim objects, we do not deal with stereoscopic information. 


One of our programs (Polybrick) identifies parallelepipeds in a 
scene which may contain partially hidden bodies and non-parallelepipedic 
objects. The program TD works mainly with 2-dimensional figures, 
although under certain conditions successfully identifies 3-dim objects. 
Overlapping objects are identified when they are transparent. 

A third program, DT, works with 3-dim and 2-dim objects, and does 
not identify objects which are not completely seen. 

Important restrictions and suppositions are: (a) the input is 
assumed perfect (noiseless), and in a symbolic format; (b) no perspective 
deformation is considered. 

A portion of this thesis is devoted to the study of models 
(symbolic representations) of the objects we want to identify; different 
schemes, some of them already in use, are discussed. 

Focousing our attention on the more general problem of identification 
of general objects when they substantially overlap,we propose some schemes 
for their recognition, and also analyze some problems that are met. 


Thesis Supervisor: Marvin L. Minsky 


Title: Professor of Electrical Engineering. 
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CHAPTER I. EXPOSITION 


The goal.- Given a scene, as seen for instance from a TV camera or a pic- 
ture, it is desired to analyze it in order to recognize, differentiate and 


identify desired objects or classes of objects (i. e., patterns) in it. 


The problem.- A picture, scene or view is read with the help of an optical 
device and stored as an array of light intensities in the memory of the 
computer. The ultimate goal will be to understand this information, that 
is, to identify, separate and position the different objects or bodies 
belonging to the scene(s). The demands of information will vary: sometimes 
we will be interested in knowing if an object is seen in the scene or not, 
while at other times we may require a complete description of the scene, 
including information on relative support and (3-dim) position of the 
different components. Hence it is clear that the recognizer will need an 
additional input to specify the nature of the question that the program 
is to answer by analyzing the scene. 
Some work has been done by the author, specifically in the area 
of "recognition" (see below). This thesis describes the general problen, 
its difficult points, possible solutions, and specific attempts by the 
author and also by some others. 
The work is divided in two parts: preprocessing, which converts 
the input into symbolic data, and recognition, which studies these 
data and, with the help of a model of the object we are searching 


for, finds all instances of that object in the scene in question. 


pee 


The different chapters of this thegis.- The array containig the scene is 
swept and transformed by the preprocessor (chapter 3), which converts the 
picture in a more compact (and perhaps symbolic) form of information. 
Sometimes a syntactical analysis (end of chapter 3) of this data is enough 
to recognize the objects we are interest in. In general, the problems found 
(chapter 2) require the use of more sophisticated weapons. Very often 


it is necessary to specify a model (chapter 7) of the objects we want to 


find, and a considerable part of this thesis (chapter 7) is devoted to 

the different models and their characteristics. Some schemes for recognition 
are proposed and discussed in chapter 8, using the misdals of chapter 7 and 
assuming we have the scene preprocessed as in chapter 3; problems are taken 
into account as in chapter 2. Finally, three particular schemes were 


implemented, and are described in chapters 4, 5 and 6. 


Contents. 


Chapter 1. Exposition (just done). 
Chapter 2. Problems found 

Chapter 3. Preprocessor. 

Chapter 4. Polybrick. 

Chapter 5. TD 

Chapter 6. DT 

Chapter 7. Models 


Chapter 8. Discussion of some schemes for recognition. 
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CHAPTER II. PROBLEMS FOUND 


This chapter will list a number of important problemas present for any 
3-dim recognition system. Some of these problems are discussed in the 
chapters which cover Polybrick, TD and DT; others are only mentioned. here 
or slightly discussed. 

Solutions, approachs and lines of thought are given when 
available. In particular, some of the problems encountered by the 
recognizer are treated in the chapter about models (chapter 7). 

These problems generally fall in two cathegories: are either general, 
or caused by the particular method or approach. It should also be 
mentioned that this chapter makes a description, rather than an evaluation, 
of some of the ways to solve the problems found. 


No hardware difficulties are discussed. 


Occlusion.- Since objects in the scene may be partially behind anothers, 
the recognizer has to be able to find instances of a given object even 
when only a part of it is actually seen in the picture. 

Small parts of an object. If an object is totally occulted in 
the scene except for a small part of it, identification becomes difficult 
and ambiguous. Here, the recognizer could use context or statistical 


(1) 


information to resolve the ambiguity, or to report the small part as 


being a portion of one of several possible objects. Note that this problem 
is one of lack of enough information. 


(2) 


For instance, Polybrick ~-somewhat arbitrarily-- decides to 


identify as cubes (parallelepipeds) corners of the form A BC D (see fig. 


‘AMBIGUOUS'). 
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Fig. ‘AMBIGUOUS’. Th2 corner A BC D 
may belong to several objects of different 
shapes. 


QM) ag done, for instance, in W. W. Bledsoe and I. Browning [2]. 


(2) See chapter 4 of this thesis. 


Degenerate positions.- Probably most recognizers will fail to identify 


the objects in figure 'CONES' as cones. 


Fig. 'CONES'. Degenerated positions 
are difficult to deal with. 


Fig. 'CONES' also shows that there are degenerate positions with non-zero 
probabilities. A major finesse will be required from the preprocessor and 
its surface functions (see chapters 3 and 6) in order to get the hint "this 
is a degenerated case". Other kind of information may also help: shading, 
shadows, knowledge of support structure, etc. If the recognizer has no 

idea that it is dealing with this case, it can do little to identify 
correctly the body, unless the frequency of these cases is such that special 
software is devoted to them. Once the recognizer suspects a degeneracy, 

the special machinery is used upon it. 

Heuristic: watch out for isolated single regions surrounded by background 


or not otherwise explained. 


Hidden Lines.- In chapter 7, talking about transparent or 3~dimensional 
models, we assert that we must know what lines or regions of an object are 
hidden by the same object, with respect to the different views of such 


a body. 


zs ee 


Perspective.- Parallel lines are no longer parallel, but they converge 
at the horizon... Accurate measurements have to be made if we want tc 
use this information in order to know the position of the objects with 
respect to the observer. 

Polybrick, DT and TD ignore this problem, under the assumption 


that we are working with small objects and/or far from them. 


Spurious regions.- In using different surface-functione or predicates for 
finding good regions (see in chapter 3, the section "the ‘summer vission 
group’ approach"), there may be overlapping among the found regions, 


duplicated regions, bad regions, etc. (discussed below). 


Bad regions.- Almost any surface function will occasionally find a region 
which is considered "bad", in the sense that it does not match exactly 
with the outline of the face to which it corresponds. It is well know, 
for instance, that intensity countour levels of a scene do not follow 
closely the outlines of the objects [21]; over a flat surface, they get 
ugly distributions, as fig. "LEVELS' indicate. . 

Problems to be solved by the executive or the recognizer are what 
function to employ in each particular case, or else how to decide if the 
produced region is acceptable. Feedback between the recognizer and the 
preprocessor (see "the generalized region appreach" in chapter 8) is 
needed at this point. Read also chapter 7 to see how models get involved. 

It will be a good idea to have a stack of functions useful in 
particular conditions; their utility could be further increased if we are 


able to compose them, that is, to apply one function to the result of 


another. Chapter 3 talks about this. 
Also, it will result worthwhile to have an easy framework to 
test different predicates manually, in order to collect the stack of 


1) 


functions mentioned above « 


ai’ Jie 
‘ 
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(a) 


Fig. 'LEVELS'. (a) A plane is illuminated by two 
concentrated sources. (b) Equal intensity curves. 
If we use a cutoff value of intensity to identify 
the region, our result will have little resemblance 
with a rectangle. 


)rowards this direction is EYE [30]. 


Overlapping regions.- The big repertory of functions (in the "region 
approach, chapter 8) suggests that the same region could be found more 
than once and, more over, that different functions when applied to the 
same face or zone will in fact return different regions; if there is 
little discrepancy in the boundary of two regions which otherwise have 
the same center of figure, extension, etc., we could conclude that they 
are the same and keep the more reliable one. 

If two regions overlap considerably but one is significantly 
bigger than the other, we may suppose that they are composed of smaller 
regions, and that we should subdivide them, using a more delicate 
surface function. Our hypothesis could be tested by taking the intersec- 
tion of these two regions and sending an speciallized feature-seeker (see 
chapter 3) to find out if the region formed by the intersection could be 


detected in a different manner. 


Duplicated regions.- These are overlapping region s whose discordance is 
small. Since their boundaries are not exactly equal, we still have to find 


a criterion for choosing the best boundary. 


Regions which are not there.- (Highlights, shadows, reflections, etc.) A 
number of regions will be found which are not ‘actually' there. Small 
(1) 


regions caused by camera noise can be eliminated due to their smallness . 


As we point out in the paragraph "bad regions", wrong surface functions are 


Oonis is dome by Larry Krakauer [21]. Note, however, that his program 


is not designed with the idea of finding "good" regions. 
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the main responsable of these monstrosities. 
{t is possible that the curved surface of the cylinder in figure 


'CYLIN' be reported as two, if we use as function the constancy of 


intensity, and even if we use the constancy of variation (first derivative). 


The recognizer should be aware of this possibility. 


Fig. 'CYLIN'. Example of regions which are not there. 
If the surface function is simple enough, two regions 
may be found where there is only one. 


Non-interesting regions.- When seeing a box with small letters written 

in ites sides, or a piece of wood showing its grain, the preprocessor 
tends to find a multitude of small regions, which are un-interesting to 
the recognizer. We have to recognize them as "non-important" because of 
their smallness, regularity, or some more appropiate property, and to 
use their position to help to construct the 'real' region containing them 


--the interesting one-- , and finally to ignore them. 


Shadows.~- Shadows cver the surfaces of bodies in process of identification 


complicate this task, although they may reveal information about the shape 
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of surfaces. A good way of discriminate them is to use as surface function 
the composition of the light, that is, the ratio of some color to the 
total lumens/n”, instead of the plain intensity (assuming we have color 


perception). 


@) could get trapped into spurious lines, 


Spurious Lines.- A line-follower 
and the same applies to the region finder (see chapter 3), when it has to 
work with noisy input. 

Such spurious lines can be eliminated by their short length, and 
on a higher level by the fact that they do not “fit" into the boundary 


of a shape for which there is good, independent, evidence. 


The a-A transformation.- The following echene ‘2) is useful in detecting 
undesired lines when dealing with rectilinear bodies. Given an array 
containing elementary segments (a small number of points plus a direction 
associated to them), as indicated in figure 'LITTLE SEGMENTS’, we associate 
with each segment a pair of numbers, « is the angle that this segment forms 
with the x-axis, and \} is the distance of the (extended) line from the 
origin. 

That is to say, we convert the figure to an array of points (see 


figure CLUSTERING’). In the a-A space, points which fall close together 


(1)The following programs are typical line-followers, and are confronted 
by the mentioned problem: 1. Sides 21 [10]; 2. Polygon detector [23]. 


(2) 


This scheme is used by Nilsson at Stanford Research Institute in the 
visual part of their robot. 
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Fig. ‘LITTLE SEGMENTS’. 

A ecene after processing by a gradient 
operation. Irrelevant segments have to be 
taken out. 


Fig. a-A. A little segment S is 
represented by the pair(a, A). 


Fig. ‘CLUSTERING’. Clouds with the € 
same a are parallel sides. i 
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are over the same line, so that frequency count will eliminate the spurious 
segments, is desired. Clouds with the same a are parallel lines, and 
this fact could be used in order to look for parallel lines. 

Smooth curved lines could also be detected by this method, if we 
use a fancier criteria for the detection of clusters), 
Spurious Points.- Often en after the application of some surface 
function (again, we are assuming the region-approach) to some part of the 
scene, the result will exhibit some isolated or irrelevant points, which 
have to be elliminated from the region. These points could be swept out 
by averaging and then having a treshold. There are also the so-called 
noise-eliminators, line-thinners, and so on, widely used in the field of 


(2), 


character-recognition Disturbances coming from noise in the camera 
could be mitigated by reading the same spot several times and averaging. 


This technique wastes time. 


Range of brightness.- Whem comparing brightnesses, it is advisable to use 
their ratios, or differences between logarithms. This helps make the system 


invariant to changes in illumination levels. 


(1) Work in this area is: 1. Evan L. Ivie. PhD Thesis [18]. 
2. Probably a conventional pattern clasifier will 
do it. See N. J. Nilsson [26]. 


(2), good collection of references is in [9]. See there [1]. 
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CHAPTER III. PREPROCESS ING 


This chapter will cover some schemes that might be used to 
preprocess pictures before we can use our "symbolic description" 
recognition methods. 

The preprocessor is responsible for taking the scene as read 
into memory -~--generally as an array of numbers which correspond to the 
intensity or brightness of the points in the picture, scene, film, etc.-- 
and transforming it into a smaller but more usable amount of information, 
usually as a highly organized description, in symbolic format perhaps, of 
points, regions, lines, surfaces. 

. The main goal of a preprocessor is to throw away as much informa- 
tion as it can, while at the same time to keep the relevant facts in an 
organized structure. Most of them perform a local operation over a point 
and its deiahboee: producing an output that depends only in the values of 


the intensity in a small neighborhood. 


= 1h = 


EQUAL INTENSITY CONTOURS (global threshold algorithms) 


The CNIOUR program.- This program [21] plots an intensity relief map of 
an image which is read from the vidisector camera (TV-B) attached to the 
PDP-6 computer. For high-contrast images, it produces something like a 
line drawing. 

A contour is a set of closed curves enclosing all those points in 
an image whose intensity is greater than a specified threshold. These 
contours correspond to -the contours of a relief map, and not to the 
boundaries of an object. Thus, except for high-qontrast pictures, equal 
intensity contours do not match (do not follow closely) the boundaries of 


a region. 


Local threshold.- Something may be gained if, instead of cutting at a 
pre-set global threshold, we make a histogram (fig. "HISTOGRAM') or 
frequency count of the scene under consideration. If this is a sharp- 
contrast figure, significant peaks will be found, and then we may put 
thresholds in the valleys (see fig. 'HISTOGRAM'). 

The output of these programs has to be fed to a line-fitter, in 


order to get numbers, slopes, etc., out of the lines. 
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A B Cc 


Fig. "HISTOGRAM". Local thresholds 
could be put at A, B and C. 


LINE FOLLOWERS 


When there exists a sharp contrast between the different surfaces, it is 
usually possible to follow fairly well the boundaries of two zones which 
differ greatly in brigthness, using what is called a line follower. 
This program sends a probe that travels the scene looking for a place 
where the intensity changes abruptly, and then travels along this change 
or discontinuity; in order to achieve this, we may think of it (the probe) 
as having two legs, and each one is kept in a different zone. This should 
be taken only as a pictorical description. 

The output of a line fo-lower is a set of lines, which often has 
to be processed slightly more, in order to elliminate very small lines, 
and in order to merge several fairly collineal segments)? , For instance, 


it may be difficult to find the exact ubication of the corners (place 


= 46 < 


where two lines intersect); one can instead follow the lines until near 


the corner and then, after the complete set of lines is found, use an 


(2), 


analytical interpolation to find the "best" corners 


From the many line fotlowers that exist, we will present two. 


Sides 21.- This program [10] for the PDP-6 computer uses a box with a 
zone of tolerance (see fig. 'BOX'), the width and length of the box 


being functions of the length of the portion of line already found and 


(2) 


of the noise present in the picture. It uses CORNS . The program searches 


for the maximum gradient, the relative location of the maximum gradient 


Fig. 'BOX'. The innermost rec- 
tangle is termed the acceptance 
box; the outer two rectangles 
are collectively termed the 
looking box. When tracking, the 
less sharply defined the edge 
is, the wider the box must be 

to successfully track it. 


with respect to the box is also known to the program, and this information 
is subsequently used to steer the box. If the line is within the acceptan- 
ce box, the program considers that the correct location of the line has 
been found, and thus will track further. In the case of a noisy edge, the 
box is widened as a function of how far the maximal gradient goes astray. 
90% of the points must be inside the acceptance box before the box will 
extend. When the program believes it has arrived at a corner, it will 


move the box in several directions to find the possible emerging lines. 


(1) 


See at this respect ‘spurious lines' in chapter 2; more detailed in- 
formation is found in Cyclops-1[{24]. 


(2) corns {32] is a program that does this. 
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Finding the edges of a polygon.- Mann [23] uses also an edge detector in 
order to find the different sides of a rectilinear polygon; it first 

searches in a rectangular grid, until it observer an abrupt change in the 
intensities. Noting this point, it continues until the intensity is sta- 


ble again. It returns the average of these points as the edge. 


STACKS OF SUCCESIVE TRANSFORMATIONS (1? 
We may transform pictures, that is, arrays of intensities (array of 
numbers) into new arrays of numbers which, generalizing, could also be 
considered as pictures. So, we would have functions which transform 
pictures into pictures, and we could stack them , that is, compose them. 


The following table shows some of the different possibilities. 


FROM TO 


line 
pictures 


symbolic 
descriptions 


point 
pictures 


Intensity 
pictures 


averaging. | threshold 


contours. 


gradient. operations 
laplacian. | Kirsch 
Intensity pictures region package. 
finder. 


color. 


smoothing. 


line 
a fitters. 


Point pictures 


White's 


Line pictures 
program. 


fitters. 


Martin's Display TD. 
Symbolic display. | programs./ programs. we Be oe cc 
description odels to pa 


terns compiler 
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In this table, the lower triangle [\ corresponds to display; the upper 
triangle \] , to preprocessing. 

Now, 4 preprocessing may be considered as a path between the 
upper left square and the lower right square; typically, we transform 
intensity pictures into intensity pictures several times, and then 
apply a transformation from intensity pictures to point pictures, etc. 
(see diagram below). Nevertheless, due to the fact that "local pre- 
processing" is expensive (time consuming), the preprocessing could be 
under ‘global’ control | --more complicated loops would appear in the 
diagram below-- so that only difficult regions are transformed much. 


This is discussed somewhat in 'the generalized regions approach’, in 


chapter 8. 


Fig. 'GRAPH'. Typical flow-chart of a preprocessor. 


@),; got this idea from T. Marill. 
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SOME OTHER METHODS. 
Laplacian. Gradient. They are local operations. See [5]. 
Ridge detectors.- See [29]. 


Logical (boolean) preprocessing.- See Kirsch [20]. 


A. Rosenfeld and J. L. Pfaltz. 
J. ACM 13, 4; pp. 471-494 (oct 66). 


Hodes line follower.- See [17]. 


REGIONS. THE "SUMMER VISION GROUP" approacu!27] | 


[33] 


A program sweeps the array containing the scene and collects 
sets of points satisfying a given predicate; these sets are called 
regions, and roughly correspond to the different faces of the objects. 
It is entirely possible, but it is undesirable, that two or more regions 
will be reported as one. To find good regions is not a trivial task. 


[22] will drive the region-finder, supplying 'good' 


Another program 
predicates; the boundaries will be sorted and "smoothed", and the bad 
regions elliminated and/or merged. 


(37, a8, interpolating 


A further preprocessing will then be done 
straight lines, segments of curves, etc., until finally each region is 
described by a set of properties, in the so-called region-notation (see 
chapter on models). 

This input is the one which the recognizer (for instance, !161) 


will use. 
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Syntactical analysis of tigures.- We cite the following references: 

1. Ledley, R. S., and Wilson, J. B. Automatic Programming languages 
translation through syntactical analysis. Comm. ACM 5, No. 3. March 1962. 

2. Ledley, R. S., Rotolo, L. S., Belson, M., Jacobsen, J. 
Pattern recognition studies in the biomedical sciences. SJCC, 1966, vol. 
28, p. 411-430. 

3. Narasimhan, R. Labeling schemata and syntactic descriptions of 
pictures. Inform. Control 7 (1964), 151-179. 

4. Hodes, L. [17]. 


5. Cyclops-1 [24]. 


Moo es 


CHAPTER IV. POLYBRICK 


I am presenting in this chapter a description and discussion of 
Polybrick. This is a program that recognizes 3-dimensional parallelepipeds 
(solids limited by 3 pair of parallel planes), using as data 2-dimensional 
orthogonal projections. 

Under the name CUBE LISP, a version of this program is running in 
the CTSS 7094 Time Sharing System of Project MAC, MIT. A more complete 
description and a listing of the program is found in a MAC mmo [13]. 


Polybrick is written in the CONVERT language [14]. 
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Introduction.- The programs contained in this chapter solve the following 
problen: 

A scene contains noise free parallelepipeds without perspective 
effects, but partially occulting one to others. Extraneous rectilinear 
objects other than parallelepipeds may be present. 

Problem: what parallelepipeds (hereafter called sloppily "cubes") 
are there and where they are (in the 2-dim picture)? 

An answer is considered bad when it misses some cube, or if it 
confuses some. On the other hand, ambiguous cubes or partially-identified 
ones should be reported as such. 

The program should also give the position of such cubes, to the 


extend such information is available. 


Input to the program.- Eventually the program will read its data directly 
(1) 


from the world*”’. Right now, the picture is transformed (by hand) to a 
list of corners and points of intersection (real Ge Pietusl) of lines, 
and their --two dimensional~- coordinates in the picture, together with 
its nearest adjacent points. 

For example, the input associated with fig. ‘'#CUBE' is 
(A (B F) B (AGC) C (BD) D (GEC) E (F D) F (AEG) G.(F BD) ) 


{1, 2] %(2,1] rer bis, 2] (3, 3] *f1, 3] ¢ [2, 2] E 


Fig. ‘#CUBE'. A cube 
showing its vertices. 


(1) 


See chapter on preprocessing. 


Coordinate 


Vertex X Y 
A 1 4 
B 1 6 
Cc 54/g 711 
D 5 8 
E 6 10 
F 8 il 
G 14 8 
H 13 6 
I 11 5 
J 9 6 
K 9 4 
L 7 1 
M 5 1 
N 5 3 
0) 5.7213 4.0909 
P 3 3 
Q 3 5 
R 8 7 
Ss 6 8 
T 8 6.5 
U 8 6 
Vv 7 6 
W 7 3 
xX 12 7 


Fig. GORDO 


To its right is its description list, tthe input to the program. 
The numbers are stored in the property list of each vertex. 


neighbors 
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Format of the answer. We use the CONVERT processor and apply the function 
cube ( in the file "CUBE LISP") to the picture GORDO (in file "gordo"). Here 


is the operation in CTSS. 


load (( a cube gordo)) 
(CERO UNDECLARED) 
(CERO UNDECLARED) 
NIL 
e (cubs gordo) 
( THERE ARE AT LEAST 3 OR 3 CUBES) 


( CUBE 1 IS(N( WOM) W(NJIL)L(MKW)) ) 
( CUBE 2 IS (I (J HX) G (F XH) X (EGI) E(XFD)) ) 


( CUBE 3 IS (P (A0Q R(S QT) Q(BRP) B(QCA)) ) 


THE PROGRAMS. 
They are written in CONVERT, a pattern-driven symbolic transformation 


language [ /4 ], and we will discuss here the following: 


CUBES2 LISP 000 Original,uses continuity. 
CUBS -LISP 000 Partitions the set into disjoint classes 
CUBA LISP 000 Final version; uses the unit 
distance method. 
CUBE LISP 000 Breaks <- into <€- (not 
conected) 


The last one is the one currently in use, but it is interesting 


to talk about all of them. 


CUBES2 LISP. Use of neighbourhood . 
If a corner ( < ) is found, we laok for a parallelogram ( <> ) 


which has that corner (we use here the information about which points are 
joined to which); as usual, solid arrows in the flow chart indicate the 


direction of success; broken ones, the direction of failure. 
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For example, in fig GORDO, CUBES2 proceeds in this way : 


eee es, Ra \ 


WY 7 


Now it tries again : it finds all the 3 cubes. 


Shortcomings of CUBES2 


The scheme just presented gives an idea of the power or weakness of 
CUBES2. It is able to find connnected cubes; for example, it solves fig GORDO 
and fig COMMON, but it fails to find 


ABCD in the figure at the right 


A 
because it is formed of two disconnected g 
parts (disconnected it the sense that,in 
order to go from one part AD to the D c 


other B C, we have to cross other cubes). 
What _to_erasc and what to leave 

Once all the points of a cube are found, we have to delete it from 
the picture, in order to process the remainder. Or, if you do not want 
to delete points, you still have to mark them as "already processed". This 
process is explained with COMMON, the example in fig "COMMON". 

Once the cube K.J I WU VF GH is found, we delete these points from 
the graph. The point G, for example, is safely deleted, since its neighbors 


»F,H and W also belong to the encountered gube. 


foes ies ye ory ny eee poo err 
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What to erase and what to leave 


(1) 


Once all the points of a cube are found, we Imve to delete it from 
the picture, in order to process the kemainder. Or, if you do not want to 
delete points, you still have to mark them as "already processed". This process 
is explained with COMMON, the example in figy ‘COMMON’. (page 34). 

Once the cube K J I WU VF GH is found, we delete these points from 
the graph. The point G, for example, is safely deleted, since its neighbors, 
F,H, and W also belong to the encountered cube. But F, for example , is 
still not deleted, since it has as neighbors point outside (not belonging 
to) the actual cube. Therefore, one pass through the graph eliminates all 
the lines arriving at points in the cube; for example, F*(E*C* K) is transformed 
to Fe (E* ce ), since K was in the cube. In this way we delete the line F¥ -K, 
if we also make the transformation from K(U J F¥) to K(U J). 

Another pass looks for points of the form W( ), that is, points 
"isolated" (not connected to anything else), and deletes then. 

The first pass is done with the CONVERT rule 
[ (XXX (Y¥Y¥ U 222) WWW) (XXX (*REPT* ((Y¥Y 2ZZ) WWW))) ] 
where we define U re “member of CUBEJUSTFOUND". 

The second pass --deletion of isolated points-- is done with 
[ (XXX X( ) Y¥¥Y) (XXK (*REPT* (YYY))) ]. 

In this way points shared by several cubes (like K) are preserved. 

But not the lines; for example, the line K-U is erased ( fig.fs%?"CHANGE"), 
because it belonged to the cube K J I WU V FG, even if it also belongs to 
the cube MNPDEFVT. 


In general, there is no way to predict such an event, since the 


(1) Canaday [4] treats this problem in a similar way. 


S27 a 


second cube has not yet been found, and therefore there is no way to tell what 
its parts are. We sill discuss this point later. 
In general, this is not a serious defect, but see the example TRICKY, 


fig TRICKY, 


Fig. COMMON after erasing cube K JI WUVFG. 
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CUBS LISP. Classification of the corners. 
We want to be able to recognize "disconnected" cubes; roughly speaking, 


the problem is this: in some way I manage to know that A QC TY 
looks like it is going to be a cube ( see also fig HIDDEN) Cc 
380 I would like to look for a corner of the forn YY in the 
direction Q C. That corner happens to be U WT V----- » UNev 
at the bottom, but in order to find it I have to continue the line Q C 
for a while, and stop after finding WT, which is the continuation. 

We could use the scheme of trying to extend all lines that seem to 
be stopped--like QC,TW --, making the picture somewhat transparent. Also, 
when looking for corner T, we could extend slowly the line QC, and every 
2 millimeters or so ask : Have I hit a point yet ? 

Instead of that, we use the opposite approach: look for the points 
(corners) which exist, and see which of them may be continuations of Q C. 
But it would be better not to laok at all of them, but just to the most 
promising ones. That is what CUBS does. 

The vertices may be CORNERS,Y's,T's or ANY's. 

The program classifies the vertices of the picture into several 
categories : 


CORNERS; With this name we denote vertices at which two lines arrive, 


for example U ,A, I, r*, etc. in HIDDEN (fig. ). 
Y's: Three lines meeting at a point, two of them codlinear; B*,W, K*, 
L*, Mw, 


ANY's: Vertices having more than 3 lines. 
What the program CUBS does is divides the vertices into CORNERS, 


Y's,T's and ANY's. The Y's are also classified into classes, according to 


Re 


° 
2 


The cubes A Q V UT and A* H are disconnected. 


4 


012 


2 40 


8 
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the slope of its sides 

After this, all the Y's of a particular cube can be found in a given 
class; if it happens that there are no parallel cubes, like in STICKS ( 
fig. ), then you simply print the classes, because each class contains 
exactly one cube. 

That is not the complete solution. There is more to be said, of course. 
When a single class contains just one vertex, such as G or M* (STICKS, fig . 

) ,it may or may not be part of a cube. CUBS make further analysis and 
depending upon the kind of vertices attached to the lines forming that Y, 
an acceptance or rejection is made. For information purposes, a message 
"FALSE CUBE FOUND" is issued. 

For example, analyzing the points attached to H, XM and F, the "Y" G 


is accepted as a cube; analyzing the points N*, f* and Z*, the point M* is 
Te 


rejected, that is to say, é > uk is not 
part of a cube. = 


ts 
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This is the solution for STICKS (last page), as the program CUBS does it: 


(CORNERS == ZM (QM Q) RM (M N) LM (KM MM) IM (W JM) I* *H* 
J*) F* (E* Gk) D* (C*® E*) B® (C® AX) Y (X YM) U (T V) K (J 
L) H (G I) F (G E) D (C £) A (B J*)) 

(TES = A/ (H* DM G*) YM (GM Z Y) XM (P R* G) WM (C B VM) UM 
(Q* E VM) TM (SM I 0) MM (LM GM V*) JM (KM IM NM) FM (Z —&M 
Y*) EM (W* FM X*) BM (AM CM DM) AM (Z* BM E*) Z* (K* AM H*) 
Y* (EM X* A*) X* (EM W* Y*) We (EM X* V*) V* (S* MM We) S* 
(R* T* V*) R* (Q* S* XM) Q* (UM P* R*) P* (N* TX Q*) K* (L* 
ze J*) J* (KX I* A) A*® (Z Y* B*) Z (YM A*® FM) W (IM V X) Q 
(ZM PR) P (Q XM 0) O (N '™ P) N (RM QM O) M (RM OML) I (H 
J T™) E (F D UM) C (D B WM)) 

(FALSE CUBE ( 0.30000002E1 0.5e0 -0.0 N*® (VM P*® M*))) 
(FALSE CUBE ( 0.2E1 0.0 ~-0.3333333 M* (T* L¥ N*))) 

(CUBE 1 IS (U* (T* L¥® C*) L* (U* M* K*) CX (B* U* D*))) 

(CUBE 2 IS (KM (JM HM LM) HM (KM X GM) GM (MM HM YM) X (W Y HM))) 
(CUBE 3 IS (NM (T V JM) H* (A/ Z* I*) V (U NM W) T (S NM U))) 
(FALSE CUBE (0.0 -0.1El -0.2El S$ (T OM R))) 

(CUBE 4 IS (QM (N PM ZM) PM (OM QM R) OM (M PM S) R (S PM Q))) 
(CUBE 5 IS (VM (WM N* UM) SM (J L TM) L (SM KM) J (I SM K))) 
(CUBE 6 IS (G F H XM)) 

(CUBE 7 IS (CM (BM DM E*) G* (DM F* A/) B (A WM C))) 


(ANYS = DM (BM CM A/ G*) T* (S* U* M* N*) E* (AM D* CM F*)) 


We print, as additional information, the CORNERS and the T's. Note that 
only a small part of each cube is printed; for example, of the long 
horizontal cube, only vertices G, F, H and XM are printed. It is not 
difficult to "fill" the cube, as CUBES2 does‘), but CUBS does not do that, 


) cee Polybrick memorandum [13]. 


Peat “ AIS pop ee, oh os saa Stade 
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41f for no other reason, because we already know how to do it, so it is 
just a matter of adding that part of the program. 
Also, CUBS does not use any information about CORNERS; we will 


need it in more complicated cases. 


Shortcomings of CUBS. 


I think the most serious one is that it is unable to make 
recognition among parallel cubes, for example cubes A Q U T and G* F* J* H* 
in fig. 'HIDDEN' (page 29) are confused and reported as just one, since 
they lie in the same class. A better (or worse) example is fig. 'COMMON' 
(page 34), where all the four cubes are parallel, and the program thinks 
there is just one. Also, the program does not check for length of edges. 

Let us not get angry at CUBS. It is obvious that the program is 
incomplete, and it is also obvious what should be done. 

The main good idea in CUBS is that, by dividing the cubes into 
classes, we transform the problem of finding all the cubes, into the 
problem of finding the cubes in a given class, in which all of them are 


parallel. This approach also solves the disconnectivity problem. 


CUBA LISP. Differentiating among parallel cubes.- 


The program just discussed takes a figure and separates the cubes 
into classes, each one of them containing parallel cubes. For example, in 
fig. "HIDDEN' (page 29), the cubes A Q V U T and E* J* D*¥ J H* are 
parallel. We would like to differentiate among them. Here we use the 


collinearity among two vertices; for example, Q and T are collineal 


*yotzqhtog & q pezh eure auecs y *,NOWNOO, ‘374 


hh v4] 
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--figure at the right-- , but Q and D* are not, so Q A 

and D* can not form a cube. 2 
Also, we do not want to compare Q with all the 

vertices of its same class in order to select the po- 

ssible ones; it seems that a further classification 


of vertices of the same class is desirable. 


Collinearity is not sufficient. For example, vertices A and B --see 
figure below-- are collineal, and still do not form a cube; therefore, 
we will select all the vertices colineal to A in the direction AT and 


(if there are some) select the appropiate one. 


Numbering the Y's. Unit distance vertices.- 


TAke a cube, pick any vertex and establish the three directions 
of its lines, as done in the figure. Now, examine for each vertex, the 


F E lines which depart from it. 


{ 
pS \ For vertex A, all its lines 
3 depart in the positive direc— 
NUN. tions \ , —-» and 1 ; 


therefore, is (+ ++) or (0 0 0). For vertex B, line BGis \ (4) 
line BC is —> (4) 
line B A is it (-) 3 
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therefore, vertex B is (+ +-) or (0 01) or 1. When we finish this 
process, our cube now looks like this: 100) 

This numbering scheme is inde- 
pendent of the starting vertex (0 0 0) 
and of the directions which are consi- 


dered positive. 


Connected vertices are unit- 
distant, that is, their binary words differ in exactly one bit. Vertices 
which are 2 units apart lie on the diagonal of the faces (AE, AG, BH, 
etc.) and vertices lying in opposite extremes of the diagonals of the cube 
are 3 units apart, for example F (100) and C (011). 


(1) 


Pre-processing .- The pre-processing done in CUBA is more complicated 
than the one done in CUBS. 
Vertices are divided into CORNERS, T's, Y's and ANY's (as before); 
1. CORNERS are divided according to the slope of the sides. 


2. T's are divided according to the slope of the top and 
the slope of the tail. 


3. Y's are divided into classes, according to slopes. 
In each class, vertices are divided according to the 
unit distance concept. If certain vertex happens to 


be the first of a given class, the number (0 0 0) is 
assigned to it. 


Localization of the cubes.- A second part of CUBA applies to each class 
of Y's the following process: 
1. A vertex is selected and the program tries to attach to it a 


cube, if possible; therefore, its unit-distance vertices are looked 
for [if the vertex in question has number (x,, Xo» x3), only sub- 


classes (x, Xo» X3)> (x}5 Xs x,) and (x), x), x5) are searched ‘”)}; 


(1) (*) footnotes in next page. 
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a vertex has to pass the test for collinearity and, if several 
are found, the closest is chosen. It turned out that these 3 
test are still not sufficient; for example, B is 

(1) unit-distant from A 

(2) colineal 

(3) the closest 

and still A - B is not (part of) a cube. 

In relation with this, see also fig. 'TOWER'. 


2. We apply to the vertices found in (1.) the same process (1.), 
up to a certain depth. 


3. The cube formed in this way is accepted if it has two or more 
vertices; if it has one, as N* (K* L* M*) in fig. "HIDDEN', page 
29, we check the extreme points [K*, L* and M* in the example], 
as explained in CUBS. 

A fancier program should say, after finding.a cube such 
as N*: "I am not sure it is really a cube, but it looks like one”. 
This comment can be inserted in this part of the program. 


4. Accepted cubes are reported and their vertices erased from 
the subclasses where they were found, and the whole process is 
applied again to the next vertex of the subclass. 


5. When a subclass (or a class) is empty, the next one is searched. 


CUBE LISP. 


Is the program currently in use; in addition to what CUBA does, it 


also breaks vertices of the type ca a in two Y's: ~ and i mate 


Manis should not be confused with the kind of preprocessing of chapter 3. 
x 
(*) - ,f{1 if x = 0 

x"l0 if x=1 
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Recognition of Cubes in a Picture which alsvu contains other Objects.- 


In the presence of non-cubic objects, an effort is made by the 
program to see cubes in them; if none is found, these objects are simply 
ignored. A good example is fig. 'HIDDEN', page 29, where the truncated 
pyramid is ignored, but only after several "false cubes" found in it. 
The output is the following: 

(FALSE CUBE (Z* (Q* Y* S*))) 

(FALSE CUBE (¥* (V* Z* X*))) 


(FALSE CUBE (X* (W* O* Y*))) 
SOLUTION TO HIDDEN 


(FALSE CUBE (S* (T* Z* S*))) 

(FALSE CUBE (Q* (Z* P* R*))) 

(CUBE 1 IS (N* K* L* M*)) 

(FALSE CUBE (X (H Y B))) 

(FALSE CUBE (J (I K H*))) 

(CUBE 2 IS (H® (Gk F* J) E* (F* G* CX) F* (E* H* D*) D* (WK F*))) 
(CUBE 3 IS (P (AQ R) 0 (QAN) Q (OPC) T (UV W)) ) 

(CUBE 4 IS (L (A* B¥ M) Z (MN A*) M (Z DL) H (B X K¥)) ) 

(FALSE CUBE (B* (Y* U* W*))) 

(CUBE 5 IS (¥ (D X I*) G (P* I* B) I* (EG Y) E (I* O* S)) ) 


(FALSE CUBE (D (¥ M S))) 


If instead of a pyramid we put an hexagonal prism, it will recognize in 
it the "cubes" ABCEFG and BCDFGH!? 
As you see, CUBE is not very successful in a Go 


foreign environment. A more general program should be 


m 


more careful about accepting candidates which look FE 


good. 
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Some Examples. 


We have already shown several figures which the program analyzes 
correctly; they are COMMON (page 34), GORDO (page 23), HIDDEN (page 29), 
STICKS (page 31). Some of them, like HIDDEN (page 29) are somewhat compli- 
cated, since they involve parallel cubes, disconnected cubes, 1-corner 
cubes, extraneous objects, etc. 

I would like to present now a couple of examples, TRICKY (fig. 
'TRICKY'), and WHAT? (fig. "'WHAT?'), where the answer is ambiguous (non 
unique). The program does its best, and its answers are acceptable but, 


in general, CUBE is not designed to solve optical illusions. 


load ((a cube tricky)) 
(CERO UNDECLARED) 
(CERO UNDECLARED) 
NIL 
e (cubs tricky) 


(THERE ARE 2 OR 1 CUBES) 

(CUBE 1 is (M (B DL) B (MC A)) ) 

(CUBE 2 IS (J (KI P) P (LH J)) ) 

(FALSE CUBE (O (H L D)) 

(CUBE 3 IS (F (ENG) N (DF AH)) ) 
CUBE accepts the 3 exterior cubes and rejects 0 (HL D). Now we apply it 
to the scene WHAT?: 


(CUBE 1 IS (0 (P Y X) KX (QW0) Q (KRP)) ) 
(CUBE 2 IS (S (DT R) D (S EC)) ) 

(CUBE 3 IS (Q (BRP) B (QC A)) ) 

(CUBE 4 IS (M (YL Z) K (J ZL) Z (WK M)) ) 
(CUBE 5 IS (H (GUI) U (TH J)) ) 

(CUBE 6 IS (M (Y NZ) Y (MO W) O (NY X)) ) 
(FALSE CUBE (V (J R T))) 


(FALSE CUBE (C (RB D))) 


Bete ilo ee Hie BRP RIERM Ere pete aie TE oe teed 


- 41 - 


These are the results for WHAT? (page 40). 6 cubes are found; 

M Y O W is accepted, but J V RT is not. This is (see figure 'WHAT?') 
certainly a possibility; otherwise, how does one explain with cubes the 
presence of lines ON and NM ? 

The next page shows the scene 'TOWER'. All the cubes but one are 
correctly identified; cubes C* T and P I are [con]fused and they appear 
in the answer as only one, namely C* N*¥ A* I. This is because we do not 
use information about lines; lines P* Q and R* T (see page 42) will solve 
the problem. 


e (cubs tower) 

(THERE ARE AT LEAST 3 OR 2 CUBES) 

(CUBE 1 IS (A*® (X I X*) X* (Z 0 A*)) ) 

(CUBE 2 IS (X* (Z 0 U*) U* (H S X*)) ) 

(CUBE 3 IS (V (C T* B) F* (T* C C¥) T* (F* V N*) N® (C*# R T*)) ) 
(CUBE 4 IS (N* (C* R P*) A* (X I W*)) ) 

(CUBE 5 IS (F (Y MN) N (G* L F)) ) 

(CUBE 6 IS (N (G* L K) K (D* G N)) ) 

(CUBE 7 IS (K (D* G J) J (H* V* R)) ) 

(CUBE 8 IS (U* (H S F) F (Y¥ M U*)) ) 

(FALSE CUBE (W* (D Y* A*))) 

(CUBE 9 IS (E (W Q* J*) E* (J* BR) U (B J* Q*) J* (E* U E)) ) 
(FALSE CUBE (E (W Q* T))) 

(CUBE 10 IS (W* (D ¥* A) I* (A QD) R* (P* A S*) A (I* R* WH)) ) 
(FALSE CUBE (P* (R* Q N*))) 

(FALSE CUBE (B (U E* V))) 
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Fig. 'TO WER‘. Vertices such as K (D* NG J) having 4 connected points, 
two of which (J and N) are collineal, get decomposed in two Y's: K (D* N G) 
and K (D* J G). 
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CHAPTER V. T D 


This ahapter describes a prderans written in the CONVERT language, 
and run in the IBM 7094 system, called TD. The function of TD is to accept 
a scene expressed as a symbolic expression (cf. chapter 3) and a model (cf. 
chapter 7), expressed in the same notation, and to find all instances of 
this model in the scene. 

The symbolic notation for expressing scenes and models uses a language 
called FDL-1 (figure description language-one). The notation restricts the 
present application of the system to scenes and models which are made of 
straight line segments. Models can be described independent of position, 
size, rotation, reflection, etc. 

The program TD is particularly well-suited for non-overlapping figures. 


Overlapping figures are identified only when they are transparent. 


Either two or three-dimensional models and scenes can be represented 
in our notation. Furthermore, the program TD will handle three dimensional 
scenes and models as readily as two-dim ones. That is, we can compare 2-dim 
scenes with 2-dim models, or 3-dim scenes with 3-dim models (both cases des 
cribable in FDL-1); this last case is rather rare, due to the difficulty to 
get 3-dim scenes to analyze. On occasion, it is possible correctly to 
analyze a scene which is the two-dimensional representation of a 3-dim 


scene by using only two-dimensional models, 


ap was developped by the author at Computer Corporation of America, 
Cambridge, Mass., under contract AF 19(628)-5914. 
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SECTION I. The Figure Description Language FDL-1. 


Introduction. 


The figure description language FDL-1 is a language in which 
figures and models may be represented in a symbolic notation. For the 
purposes of FDL-1 we restrict ourselves to figures and models which are, 


made of straight-line segments. 


Formally, there is no distinction in the language between a figure 
and a model. Informally, we will use the term "figure" to mean a certain 
specified picture (which may or may not be fixed in position), whereas we 
will use the term "model" to refer to a class of pictures, such as the 
class of squares for example, or the class of chemical formulae containing 


one benzine ring. 


THE LANGUAGE 


Points. 


Points are the building blocks of further structures. A point is 


represented by an atom. Example: A 
B 
coc are points (see fig. 'POINTS'). 


Vertex. 


A vertex is a point followed by a (non-ordered) list of points, 


called the neighbors of such a point. A vertex has no repeated neighbors, 
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A. 
Fig. 'POINIS'. The simplest 
B ; element of a figure is a 
point. 
coc 


and is not a neighbor of itself. 

Examples: A (BC D) is a vertex; B, C, and D are the neighbors of A. 
A (BDC) is a vertex. 
A (BC C) is not a vertex. 


A (BC A D) is not a vertex. 


Ele t F es 


A list of vertices satisfying the following constraints is called an 
elemantary figure: 

1. Each point mentioned occurs exactly once as a vertex. 

2. Neighborhood is a symmetric property; that is, the occurrence 
of ..- A (..¢ B ...) oe» in the figure means that ... B (... A soe) oon 
must also be present. 

Elementary figures are sometimes called the connection matrix or 
connection list. The order in which the vertices are mentioned is irrelevant. 
Example: (A (BC) B (C DA) D (BC) C (AB D)) 
is an elementary figure. See figure below. 

Note that the vertex A (B C) is different from the figure 


(A (B C) B (A) C (A)). See figure 'THREE', 


SRS Gwe ct betty Ben ast tee FOS. Sy RASTER RD Coe ape oes ca DEMERS Tt 
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(a) elementary figure 
(b) point A. (a) A 
(c) vertex A (BC). 


(b) A 


(c) A 


Fig. 'THREE'. The elementary figure 
(A (BC) B (A) C (A)). 


Example: (A () B () C ()) is an elementary figure (see figure below). 
Example: ((A (B C) M (N S))) is not an elementary figure. 


Example: (A B (C) D (116. 0.563)) is not an elementary figure. 


This elementary figure shows the 

fact that the neighbors of a given 
point P are the other vertices to 
which lines from P are drawn. : 
Compare with figure 'THREE'. -¢c 
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Example: (A (B C) C (B A E) D () B (C A) E (C) ) is an elementary figure 


(see figure below). 


E 
A c 
c 
D 
A e 
°D mp B 
(a) (b) 
Two figures (a) and (b) may be described 
by the same elementary figure in FDL-1: 
(A (BC) C (BA E) D () B (C A) E (C) ). 
Properties. 


Elementary figures describe only the topology of the connection 


between the different vertices of the object ‘1 


3 in order to characterize 
further the scene or model in question, we modify (we restrict, in fact) 
this topological skeleton by specifying properties which the figure has 


(to have). 


A property is an ordered list whose first atom has been declared to 
be of the type “property name" (i. e., it is not a point), and whose 
remaining elements are atoms representing points, or numerical constants. 

A property is simply a predicate, i.e., an expresion with open variables, 
such that the expression becomes T or F upon substitution for the variables. 


Examples: (LENG C B 4.0) 


(evans [8] uses a similar scheme for the representation of his figures. 
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(ANGLE A B C 75) 


Attachment of Properties to Figures. 


Given a figure, a new figure may be formed by attaching to it a 
collection of properties, using the connector ‘where’. An example will 
illustrate the syntax. 

(A (BC) B (AC) C (A B)) (1) 

(SLOPE A B 3.0) (LENG C A 5.0) (2) 
Expression (1) represents an elementary figure with three vertices; 


expression (2) represents two properties. 


A new (non-elementary) figure may be formed by saying: 


((A (BC) B (AC) C (A B)) where ((SLOPE A B 3.0) (LENG C A 5.0))) (3) 


In the example, (1) is any triangle, and (3) is any triangle with a side 
of length 5.0 and the adjacent one with slope 3. 
Example: ((M (N R) Q (N R) N (M Q) R (M Q)) where 

( (LENG N R 8.5) 


(SLOPE M Q 0.0)) ) 
represents a quadrilateral with an horizontal diagonal, the other being 


8.5 units long. 


Therefore: A figure may be formed by a list containing 3 elements: 
1. a figure 
2. the connective where 


3. a list of properties. 


SERIE TIDE or eA STEERER RRS ta # PERNT a Se RRC torte PEASE SSNS er Et 
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extie a m_ variables. 


The properties (SLOPE A B X) 
(SLOPE M N X) 
where X is not a point, but an explicitly declared "open variable", are 
interpreted as saying that the slope of AB is X and the slope of MN is 
also X, whatever X may be. In short, the lines AB and Mi are parallel. 
In general, open variables are used when we do not want to commit ourselves 
to specific values, but insist only that the value be the same each time 


the variable is encountered. 


In order to distinguish open variables from points and property 
names, open variables are declared as such using VARIABLES, which is a 
special property. Thus, the expression 

(VARIABLES ALPHA THETA CAMOTES ....) 
defines the atoms ALPHA, THETA, CAMOTES, ..., to be open variables. 
These variables are considered open only with respect to the figure which 
they modify. An example of a figure containing open variables is as 
follows: 

((P (QR) Q (PR) R (PQ)) where 

( (LENG P Q Ll) 

(LENG Q R LL) 
(LENG P R L1) 
(VARIABLES L1) ) .) 
This figure represents an equilateral 


triangle. A second description may be: 


sty gtx 
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(C (P QR) Q (CR) R (P Q)) where 
((ANGLE P Q R ALPHA) 
(ANGLE R P Q ALPHA) 


(VARIABLES ALPHA))) where 


((ANGLE P R Q ALPHA) 
(ANGLE P Q R ALPHA) 


(VARIABLES ALPHA)) ) 


Composition of Figures. 


Boolean connectors may be used to form new expressions. For example: 
(=OR= (A (BC) B (A C) C (A B)) 
(A (B C) B (A D) D (BC) C (A D)) ) 


is a representation of a triangle or a certain type of quadrilateral. 


Definitions: Single Names. 


An operator is now introduced which allows us to give to a whole 
figure a single name. This operator is 

(=DEF= name fig) 
where name is an atom not previously used as either a point, an open va- 
riable, or a property name, and fig is a figure. After such a statement 
has been executed, name and fig are completely equivalent and interchan- 
geable. 

=DEF= allows us to set single atoms to stand for whole figures. 
For example, assume we have a quadrilateral, i. e., 


(A (B D) B (A C) C (BD) D (A C)) 
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Fig. ‘QUADRILATERAL’. A model. 


We may define the atoms "QUADRILATERAL" and "QUAD" both to represent 
that quadrilateral as follows. 
(=DEF= QUADRILATERAL (A (B D) B (AC) C (BD) D (A C))) 
(=DEF= QUAD QUADRILATERAL). 


A parallelogram may now be defined as a quadrilateral having two equal, 


_ | 


Fig. "PARALLELOGRAM'. 
Parallelogram defined 
with the help of the 
model of figure 
*QUADRILATERAL' , 


non-adjacent, and parallel sides, such as AB and DC (not CD). For example: 
(=DEF= PARALLELOGRAM (QUAD where 
( (SLOPE A B S) 
(SLOPE DC S) 
(LENG D C L) 


(LENG A B L) 
(VARIABLES AL) )) ) 


7 7; . 
a \enis feature is considered important by Sutherland in Sketchpad [34]. 
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We may now define a rectangle as a parallelogram having one right angle: 
(=DEF= RECTANGIE (PARALLELOGRAM ere 
\ ((ANGLE D AB 90°)) )) 

Of course, we could have made some of the definitions in a different 
manner; for example, 
(=DEF= PARALLELOGRAM ((A (B D) B (AC) C (BD) D(AC)) where 

((SLOPE A BX) (SLOPE A D Y) 

(SLOPE DC X) (SLOPE BC Y) (VARIABLES XY) )) ) 
The above defines a parallelogram to be a quadrilateral with opposite 


sides parallel two by two, 


A rectangle may be defined as a parallelogram constrainted to have 
its two diagonals of equal length, as follows. 
(=DEF= RECTANGLE (PARALLELOGRAM where 
((LENG A C 2) 
(LENG DB Z) (VARIABLES Z) )) ) 
Note that properties are not attached to lines, but to figures; for 
instance, (LENG AC Z) is a well-defined property, even when the figure 


cr) 


does not contain line AC 


Definitions of New Properties. 


New properties may be defined at will. In order for the recognition 
program to take proper action in regard to the new properties, these must 
be defined before use in terms of LISP functions. 


A property (P Ay A ieee Al A) is handled by producing a call to 


2 
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the LISP function (P A,' A,’ as Ave)? where A," is the expression 
obtained by replacing A; by its value (with respect to TD); the value 
of this LISP function is then computed and compared with Al’) yielding 
a match or a Caituve*) furthermore, when the mentioned value is T, 
this comparison does not take place, and TD handles this case as if a 


successful match were occurred. 


For example, suppose we want to define the property EQUILATERAL, 


function of m_ sides Ay Bis Ay Bos oee5 AL BD and we will say that a fi 
gure has such property if | length Ay B, falls within + EPS of the 
average length a Z length A;Bi 
AB = conn --2------- 
Zi 
4 


where EPS is some pre-specified tolerance. We must write a LISP function 
of name EQUILATERAL, of 2m (not 2mt+l) arguments, whose value is, for ins- 
tance, YES if the arguments [whose values are the points forming the 
sides of the figure in question] fulfill the appropiate requeriments; this 
function should check its arguments to see if some of them is not a point, 
in whose case should return T; otherwise (failure), its value must be dif- 
ferent from YES or T. One could then write 
(=DEF= SQUARE ( (A (B D) B (AC) C (BD) D (A B)) where 

((EQUILATERAL AB BC CD DA YES)) )) 
The user is able to define properties as complicated as he wishes, since 
properties are functions (predicates) of several variables, the variables 
‘being the (coordinates of the) vertices, and the values which obtain diffe 
rent UAR variables: slopes of lines, distances, etc. Therefore, arbitrarily 


complex restrictions may be specified, and models can have fairly elaborate 


properties or constraints between its different elements. 


(anis comparison is done by RESEMBLE (see CONVERT [14]). 
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SECTION II. The Program 


What the Program does 


As already mentioned, the recognition program called TD accepts a 
description of a scene expressed in the notation of section I and a des- 
cription of a model expressed in the same notation, It will produce, as 
output, the instances of the given model in the given scene. 

TD operates in several modes which are set by 'switches'. There are 


three switches called EXACT, ALL and SYMMETRIC. 


EXACT can have ane of two values: T and (). A value of T for EXACT is the 


normal mode. In this mode, an object will be said to match the model only 


> Gd- lez! 


(m) (n) (p) 


Fig. y-16. The small triangle ABC in (p) is not recognized 
when EXACT is T, but it will be when EXACT is (). 


if the vertices of the object have exactly the same number of lines as 
are specified for the corresponding vertices in the model. Thus, ABC will 
be recognized as a triangle in fig. y-16-m and in fig. y-16-n, but not in 


fig. y-16-p. 


When the value of EXACT is (), the object will be recognized as matching 


a model even if there are more lines occurring at a given vertex than those 
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specified in the model. Thus, ABC in fig.y-16-p will also be recognized as 


a triangle. 


The next switch is called ALL. It takes one of three values: T, () and 69. 
The normal setting is T. In this state, the program will identify a certain 
portion of the scene, erase that portion, and then operate on the remainder. 
The program terminates when the scene no longer contains parts which may 


be identified as the model in question. 


Under the setting (), the program stops after having identified the first 


instance of the model. 


Under the setting 69, the program will do an exhaustive analysisof the 
scene in terms of the given model. Thus, for example, in fig. 'ALL' the 
program will find two rectangles in the settiug T, one rectangle in the 
setting (), and 12 rectangles in the setting 69. The 12 are all the permu- 


tations of the three rectangles, ABCD, NMLK andSNRD. 


The third switch is called SYMMETRIC. it 9% \ 


R M 
has two values: T and (). The setting T aes 


is to be used when the current model is, A s 


in fact, symmetric. In this case, the pro- K L 


t t 
gram will operate faster than in the Bega: Ailes Eyeivevrecrangtee: 
mode (), the normal mode. In case the model is not symmetric and the switch 


is set to T, the program will not behave correctly. See examples below. 


idee tinat ete ges mo ees pe eam 
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Fig. 'P27'. Scene analyzed in example 1. 
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Example 1, P27. 


We have shown (fig. 'P27) in the page 56 a scene we want to analyze. 
The symbolic description of such a scene, that is, the way it is assimilated 
by the program, is the following: 


(LAMBDA (A BC) (PUT A B (LLENA C))) 

(P27 SCENE (A 3. 9. (BD) B 1. 25. (AC) C 10. 25. (BD) 

Dil. 17. (AC) E 1l. 15. (F Z) F 12. 13. (EG) G 18.421052 

16.210525 (F Z UH) H 18. 12. (G I) I 23.833333 11.416667 

(HRQJ) J 21. 10. (1K) K 2.59. (JOPL) L 13. 9 

(K M) M 24, 1. (LB) N35. 9. (0 M) 0 28,.230769 9. (P K S N) 

P 24, 4. (KO) Q 28, 11. (R I) R 26.263158 13.631578 (I T S Q) 

S 35. 17. (RO) T 29, 21. (UR) U 28.166666 21.083333 

(X V TG) V 30. 22. (UW) W 29. 24, (XV) X 24. 21.5 (Y WU 2) 
Y 19. 22. (X Z) Z 18.68421 18.84210 (EY XG) )) 

PUT (QUADRILATERAL MODEL ( Ax (BY DY) Be (A CH) Ce (BY Dé) 

DY (A® Ce) )) 

PUT (TRIANGLE MODEL (A* (Bk C*) B¥ (A* CH) CH (A¥ BE) )) 


The last three rows define the models "quadrilateral" and "triangle". 
* c* B* 


ae p* ae c* 
Models "quadrilateral" and "triangle" used in 
the analysis of fig. 'P27' (page 56). 


We ask the program to look first for triangles, then for quadrilaterals: , 


(TRIANGLE 1 IS (J S P)) 

2 I8 (LN M)) 
((O (P K 8 N)) (0 (RI)) (R (ITS Q)) (TF WR) UCKVT 
G)) (V (U W)) (W(X V)) (X (YWU Z)) CY (X Z)) (Z (BY XG)) 
(A (B D)) (B (A C)) (C (B D)) (D (A G)) (EB CF Z)) @ (2 G)) 
(G (F ZU H)) (I (GI)) (I (ARQ J)) (KK GW OP L))) 
(QUADRILATERAL 1 IS (A B D C) 
(QUADRILATERAL 2 IS (E F WV) 
(QUADRILATERAL 3 IS (¥ T H Q) 
(RB (1 T 8 Q)) (S§ (RO)) (CU (KVTGE)) (K (WU 2Z)) (Z CE 
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¥XG)) (G (FZUH)) (I HROJ)) (J (1 K)) K (0 P L)) 
(L (KM)) (Mf (LN)) (N (0 M)) (0 (PK SN)) (P (K 0))) 


Now we change EXACT to (). When we look for quadrilaterals, the answer is 
(QUADRILATERAL 1 IS (A B D C)) 

(QUADRILATERAL 2 IS (E F Z G)) 

(QUADRILATERAL 3 IS (LO M N)) 

(QUADRILATERAL 4 IS (U X V W)) 

(CH (G I)) (1 (HRQJ)) GW (ZL R)) & GO PL)) (P &O)) 

(Q (RT)) (R (ITS Q)) (S (RO)) (T CU R)) (CX (XK Z))) 


note that the analysis -is consistent with the setting EXACT = () and 

ALL = T; namely, we cannot identify other quadrilaterals in the remainder. 

This answer was one of several possible matches, which could be discovered 

by reordering the vertices of the scene and applying TD again, or by making 
ALL = 69, 


Looking for all (in the 69 sense) triangles: 


cset (symmetric nil) cset (all 69) cset (exact nil) 
td (triangle p27) 


(TRIANGLE 1 
(TRIANGLE 


(the program was stopped and did not finish). 
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Fig. 'SQUARE', 
Find all the squares in this picture. 
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Example 2. 'SQUARE'. 


(see fig. 'SQUARE', page 59). The symbolic scene is 


(LAMBDA (A B GC) (PUT A B (LLENA C))) 

(SQUARE SCENE (A l. 6. () B1. 8 () C2. 3. () D3. 3 () 
E 3. 6. () F4. 8 () G4 16. ()H6. 5. () I 12. 8. () 
J12. 14. (2) K 15. 1. OL 16. 7. () M17. 14. () 

N 22. 8 () » 


We will look for squares here, in the sense of séts of four points 
which could be located at the corners of a square; the model in 
question is 


(PUT (SQUARE MODEL 

( @t* () OF () N* () P* () ) WHERE 

(LENG M* N* LL) (LENG P* O* L1) (LENG N* O* L2) (LENG M* P* L2) 
(ANGLE N* M* O% Al) (ANGLE M* N* P* Al) (ANGLE P* M* O* AL 

(ANGLE O* N* P¥% Al) (VARIABLES Ll L2 Al) ))) 


) 


The answer is 


(SQUARE 1 IS (A D F H)) 
(SQUARE 2 IS (K MC G)) 


Example 3. 'X S'. 


We will now analyze a three dimensional scene (see fig. 'XS', page 61), 
or rather, to a 2-dim view of a 3-dim scene. We are interested in objects 
of a shape as "X " (see fig. ‘EQUIS'.) 

The operation is: 

CSET (SYMMETRIC ()) 


TD (EQUIS XS) 


(EQUIS 1 IS (IJ HAFGEDCB)) 
(EQUIS 2 IS (K D1 LSOMNPQ R)) 
(EQUIS 3 IS (Z Al U BL VT Y WX Cl)) 
NIL 


Se oo ee 
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Fig. "XS". 
Scene for example 3. Model is called 'X' (see fig. 'EQUIS') 
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Fig. 'EQUIS'. A model 
for fig, 'XS', 
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The NIL at the bottom is the remainder of the scene, as allways. In this 
case is an empty remainder, i. e., the scene consisted only of the searched 
object (EQUIS). 

Nevertheless, when we look for the object EQUIS (page 62) in the 
figure below, the program fails to identify it. 

This is due to the fact that the two dimensional representations 
are different. Chapter 7 discusses this in detail. 

A solution to this is to define a model (like the block in question) 


as one of several models; FDL-1 has an =OR= for this effect. 


Fig. d24,. The model EQUIS (page 62) is 
inadequate for the identification of this 
drawing. 


34 
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Fig. 'C HEMI S'. This scene was 
analyzed by TD using the models in 
the next page. (see example 4). 
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u* Fx 


Figg 'NAPHTALENE', A model. 


Ex FX 
D* 
i’ Ax G*¥ 
K Be H* 
J* \* 


Fig, '‘ACENAPHTYLENE'. A model. 
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Example 4. CHEMIS 
~-see figures "N AP HT ALEN E" (page 65), ACENAPHTYLENE (page 65), 


and CHEMIS (page 64).-- 


Several chemical compounds were looked in CHEMIS, page 64. The 


results are given below. 


(NAPHTALENE 1 IS (NO M Fl Al G BI S L A)) 

((P (H G1)) (O (¥ X)) (R (E2 M2 12)) (T (G1 H1)) (U (XI B2 
Z1)) (V (C2 A2)) (W (CE K)) (X (Q W1)) (Y (V1 T1 Q)) (Z (tl 
Ll N1)) (C1 (J1 Kl G1)) (D1 (12 32)) (El (M2 P1)) (GL (PT 
Cl)) (HL (Tl I1)) (Il (Hl K1)) (J1 (BC1)) (Kl (Cl I1)) (1 
(Ul Z)) (ML (F NL)) (Nl (Zz M1)) (O1 (L2 K)) (P1 (El F2)) (QL 
(D J)) (RL (1 C)) (SL (X1 YI Z1)) (TL (¥ WL Z)) (U1 (V1 L1)) 
(V1 (XY UL)) (WL (TL X F)) (Xl (S1 I U)) (¥1 (C S1)) (ZL (U 
$1 A2)) (A2 Zl V)) (B2 (U C2)) (C2 (B2 V)) (D2 (E2 G2 J)) 
(E2 (F2 D2 R)) (F2 (Pl E2 D)) (G2 (D2 H2)) (H2 (G2 12)) (12 
(R H2 D1)) (32 (K2 K1)) (K2 (M2 J2)) (L2 (Ol E)) (M2 (EL R 
K2)) (B (H J1)) (C (RL ¥1)) (D (F2 QL)) (E (W L2)) (F (WL M1)) 
(H (B P)) (I (R1 X1)) (J (D2 QL)) (K (W 01))) 


{ACENAPTHTYLENE 1 IS (T1 Z Y Wl L1 N1 V1 Q X F M1 Ul)) 
(CXL (S1 I U)) (¥1 (C S1)) (Zl (U S1 A2)) (A2 (Z1 V)) (B2 ... etc, etc. 
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CHAPTER VI. DT 


The present chapter describes a program, written in CONVERT, and run in 
the PDP-6 computer of Project MAC, M. I. T., which recognizes objects in 
a scene. Two imputs to the program determine its behavior and response: 
1. The scene to be analyzed, which is entered in a symbolic format, 
called the region-format, somewhat different from FDL-1. 
2. A symbolic description --the model-~ of the class of the objects 
we want to identify in the scene. 
Given a set of model s of the objects we want to locate, and a scene or 
picture, the program will identify in it all those objects or figures 
which are similar to one of the models, provided they appear complete in 
the picture (i. e., no partial occlusion or hidden parts). Recognition is 
independent of position, orientation, size, etc.; it strongly depends on 
the topology of the model. 
Important restrictions and supositions are: 
(a) the input is assumed perfect <--noiseless-- and highly organized. 
(b) more than one model is in general required for the description of 
one object. 
(c) partially seen objects may appear in the scene, but only objects 
which appear unobstructed are recognized. 


Work is continuing in order to drop restriction (c) and to improve (a). 


A more complete description of Df is found in a Project MAC memorandum[16]. 


Relation of DI with other parts of this thesis.- DI represents the implemen- 
tation of a different approach to recognition; it works with regions, 
instead of lines, as TD does, It is general, and may recognize any model 
(within its limitations), instead of only parallelepipeds, as Polybrick. 

Its models are discussed in chapter 7. DI needs improvement to deal with 


partially occluded objects. 
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An example of recognition.- This chapter describes DI, a program which, 
given an scene (such as 'EXAMPLE2') and a model (such as 'CUBE'), 

will identify all 'parallelepipeds' present in 'EXAMPLE2'. In this case, 
parallelepipeds 1 and 3.are found; parallelepiped 2 is partially hidden 


and is not recognized, Both the scene and the model are in symbolic format. 


anal 


Fig. 'EXAMPLE2'. Three parallelepipeds. 


Restrictions: In this first experimental system we will live with the follo- 
wing constraints: 

1.- Noiseless data is suposed, i. e., the scene must be accurately 
described by its symbolic representation. Also, the set of shapes assumed 
is small, so that we need not worry about heuristic efficiency in algorithms. 

2.~ Whenever a 3-dim object gives rise to several (2-dim) projections 
which are topologically different, all these need to be presented as models 
in order to cover the possible cases. The recognizer has an OR feature 
for this effect. For instance, fig. 'L' has the same object in four diffe- 
rent positions, requiring 3 or possibly 5 models of an 'L’ to identify all. 
The exact number depends on the particular models in question and their 
“dont-care" conditions, which may depend on what other objects in the 
world have to be distinguished. 


3.7 Only objects which are totally seen are recognized, Partially 


mo MERTEN TTT 


LORE OE oS SPs AES SRP RRS peace st ageeg- ote op iseytina ts Shue cane 
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occluded or hidden parts or bodies may be present in the picture but the 


occulted objects will not be identified. For instance, parallelepiped 2 


Fig. 'L' 
The same object in four different positions, 
all of which differ in the topology of its 


two dimensional projection over the plane of 
the drawing. 


in fig. "EXAMPLE2' was not found. Our current work will help to relax this 
last restriction, and also restriction (1). The reader unfamiliar with 
progress in that direction can see references 4, 13, 24 and 29 for some 
earlier work of that kind. 

4.- In the present program we assume orthogonal projections. Later we 
will considere finite perspective. For small visual angles, a simple tolerance 
should suffice for most cases, but for large visual angles we will have to 


use other methods. 
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THE SCENE 


Informally a scene (picture) is a collection of regions (projections of 
faces); a region is described by an ordered collection of segments (lines 
or curves), and these have several properties. 

A scene is represented by an atom which has under the entry 'regions' a 
list of the regions composing it; for instance (see fig. 'BOTTLES'), the 
atom BOTTLES is a scene for which 

(GET (QUOTE BOTTLES) (QUOTE REGIONS)) = (ABCDEFGHIJKLM 2) 

In this case the regions of 'BOTTLES' are A, B, «ee » My Ze 

A.region is an atom which has in its property list the entries NEIGHBOR, 
SHAPE, and possibly others. A region corresponds to a surface or face in 


the scene, except that it is treated 2-dimensionally; i. e., in fig. 


'"EXAMPLE2', the upper face of the eraser AB CEL is composed of two regions, 


namely B and L. 


8B. Z 
Example.- In the property list of 
region M (figure 'BOTTLES') we find: Fig. "EXAMPLE2' 
NEIGHBOR (L Z) (L and Z are limitrophe regions with M) 


SHAPE ELLIPSE 

At present, the shapes of regions can only be atoms; this is a severe 
restriction since may be too much to require that the preprocessor recognize 
region M (fig. 'BOTTLES') as an ellipse or region A (fig. EX2) as a paralle 
logram. In the models, the shapes are also atoms. This restriction will be 


abandoned eventually, but now is observed. 
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FIG. 'BOTTLES' 
An scene composed of regions A, B, oe. » Ly My Ze 


THE MODEL 


A model is an atom which contains in its property list, under the entry 


"REGIONS', a list of the following form: 


a) the first element of such a list is an atom, the name of the region, 


as far as the model is concerned. 
b) Each of the remaining elements of such a list is a property; specifi- 


cally, is either a list (NEIGHBOR ...) 
or a list (SHAPE ...). More complicated properties 
will be used when objects start getting more complicated. 


A model is composed of regions, with properties inter-relating them. 


Given an object, there is a large number of models which correctly describe it. 
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Fig. 'HOUS BE. A model, 


Example. The model 'HOUSE' is written in this way (see fig. 'HOUSE'): 


HOUSE 

(in its property list, we find:) 

REGIONS ((A* (NEIGHBOR B*) (NEIGHBOR C*) (SHAPE PENTAGON) ) 
(B* (NEIGHBOR A*) (NEIGHBOR C*) (SHAPE PARALLELOGRAM) ) 
(C* (NEIGHBOR A*) (NEIGHBOR B*) (SHAPE PARALLELOGRAM)) ) 


What this list means is that HOUSE is composed of three regions, namely 
A*®, B* and C*; and that A* is neighbor of B* and C*, etc. 

More over, it says the shapes of A* (pentagon), B¥ (parallelogram) and C* 
(parallelogram). Additional properties could be inserted here. 

The names A*, B*, etc., given to the different faces, have no importance, 
they act as dummy variables (UAR or 'undefined' variables in CONVERT) ; 


the names such as PARALLELOGRAM, PENTAGON, etc., given to the shapes, are 


Fig. ‘PYRAMID’ 
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crucial, since they are going to be compared by equality with the corres- 
ponding names in the property list of the regions of the scene. Note that 
the models we are using are not "cathegorical" -- they do not contain enough 
information (usually) to reconstruct the object. 

Example.- PYRAMID (see fig. 'PYRAMID') is a model written as 


(DEFPROP PYRAMID ((A* (NEIGHBOR B*) (SHAPE TRIANGLE) ) 
(B* (NEIGHBOR A*) (SHAPE TRIANGLE))) REGIONS) 


--but also see fig. 'PYRAML’.-- 


Fig. 'CYLINDER'. A model, 
(DEFPROP CYLINDER 
((A*® (NEIGHBOR B*) (SHAPE ELLIPSE) ) 
(B* (NEIGHBOR A*) B 
(SHAPE (I C I D))) ) 
REGIONS) 


Remark: Note how we describe B*'s shape as 


(SHAPE (I CI D)), i. e., as (straight, convex, straight, concave). 


Example.- A cube (parallelepiped) is described as 


(DEFPROP PARALLELEPIPED ((C* (NEIGHBOR E*) (NEIGHBOR D*) (SHAPE PARALLELOGRAM) ) 
7 (D* (NEIGHBOR C*) (NEIGHBOR E*) (SHAPE PARALLELOGRAM) ) 
(E* (NEIGHBOR D*) (NEIGHBOR C*) (SHAPE PARALLELOGRAM) ) ) 

REGIONS ) 


Fig. 'CUBE'. A model, 
It is really a parallelepiped. 
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THE RESULTS 


We will present now several examples of scenes analyzed by DI, the program, 
in the PDP-6 computer. The symbol y marks the lines typed by the user. 


Y 


Y 


CONV 4 Bring the CONVERT processor from 
tape 4. 
(UREAD DI LISP 5 tQ 4W) Load the file containing DI, 
the recognizer. 
(UREAD EX2 LISP TQ {W) Bring the scene EX2 into 
memory (see fig. 'EX2'). 
(UREAD MOD2 LISP fQ fW) (IOC V) Load the models 
(v) 
(DT (QUOTE CUBE) (QUOTE EX2)) Look for ‘CUBES’ in 'EX2", 


(CUBE 1. IS (AB C)) (see fig. 'EX2'). 


(CUBE 2. IS (J L M)) 
WDWEFGHIKNOPQRSTUVWXY 2) Remaining of scene. 


(DT (QUOTE CYLINDER) (QUOTE EX2)) Look for cylinders (see fig. 
(CYLINDER 1. IS (E D)) "CYLINDER" ). 

(CYLINDER 2. IS (G F)) 

(AABCHIIJKLMNOPQRSTUVWXY¥ 2Z) ~~ Remaining of scene, 


(DI (QUOTE -HOLLOWCYLINDER) (QUOTE EX2)) 
(HOLLOWCYLINDER 1. IS (T U S)) 
(ABCDEFGHIIJKLMNOPQRVWXY 2Z) 


(DI (QUOTE HOLLOWBRICK) (QUOTE EX2)) 
(HOLLOWBRICK 1. IS (N O P Q R)) See fig. 'HOLLOWBRICK'. 
(ABCDEFGHIJKLMSTUVWXY 2) 


We define DD, a FEXPR that suppresses the QUOTEs: 


(DEFPROP DD (LAMBDA (A) (DI (CAR A) 
(CADR A)))  FEXPR) 
DD 
(DD HOLLOWBRICK EX2) Compare with above. 
(HOLLOWBRICK 1. IS (NO P Q R)) Good. Let us see other example. 
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We analyze now FIG2 (see fig. 'FIG2') with DI. 


Y (DD PYRAMID FIG2) Looking for PYRAMIDs (see model 
RAMI in fig. 'PYRAMID'). DD is like 
it D 1.38) (CL My) DI, but it is an FEXPR (cf.[28]). 


(PYRAMID 2 IS (X Y)) 
(ABCDEFGHIJKNOPQRSTUVW2Z) 


Note that pyramid L MN is not reported as such, but only IM is 
reported or recognized. Why is this? Because the model 'PYRAMID' (see 
fig. 'PYRAMID') is composed of two triangles. Also, it is in the nature 
of our algorithm that LM prevents recognizing MN. In order to get 
LMN, we define PYRAM1 as a pyramid which has three visible triangular 
faces: 

Y  (DEFPROP PYRAMi ((A* (NEIGHBOR B*) (SHAPE TRIANGLE)) 
(BX (NEIGHBOR A*) (NEIGHBOR C*) (SHAPE TRIANGLE) ) 
(C* (NEIGHBOR B*) (SHAPE TRIANGLE)) ) REGIONS) 
PYRAM1 See fig. 'PYRAMI', 
Now we apply this model to scene FIG2: 
y (DD PYRAMI FIG2) 
(PYRAM1 IS (LM N)) 


(ABCDEFGHIJKOPQRSTUVWXY Z) Aja. Only one is found. 
Correct. Only one pyramid with 
three visible faces is present in FIG2. 


What we really want is to define a pyramid as something which shows 
either two or three triangular faces; so, 
y (DEFPROP PYR (OR PYRAMID PYRAM1) REGIONS) 


PYR The last model of an OR, PYRAM1 
in this case, is searched first. 


At this moment, PYR is a model which stands for either or 
y (DD PYR FIG2) 
(PYRAM1 1 IS (LM N)) 
(PYRAM1 2 IS (X Y)) Good. Two objects were found to 


match with PYR: (L MN) 
(ABCDEFGHIJKOPQRSTUVW 2) and (XY). See fig. FIG2. 


What would have been happened if we define PYR in the reverse order? 
Let us define 

Y (DEFBROP PYR (OR PYRAMI PYRAMID) REGIONS) 
PYR 
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The last model in the OR list, PYRAMID in this case, is searched first. 
The answer is: 


y (DD PYR FIG2) 
(PYRAMID 1 IS (L M)) 
(PYRAMID 2 IS (X Y)) Two objects matched with PYRAMID: 
(X Y) and (LM); after 
(ABCDEFGHIJKNOPQRSTUVW2Z) this, no object was found 
to match with PYRAM1. 
Conclusion: Order in the models is important, so long as we leave 
things to the normal CONVERT matching algorithm. 
yY (DD PYR FIG3) 


NIL FIG3 is an empty scene. 


y (DD CYLINDER FIG2) 


(ABCDEFGHIJKLMNOPQRSTUVWXY Z) No cylinders. 
Cylinder P 0 is partially occulted, 
so is not found. 


y (DD CUBE FIG2) 
(CUBE 1 IS (I J K)) 
(ABCDEFGHLMNOPQRSTUVWXY 2Z) 


y (DD ANGLE FIG2) 
(ANGLE 1 IS (D AB C)) 
(EFGHIJKLMNOPQRSTUVWXY 2) 


Angle is a model described in the next page (see fig. 'ANGLE'). 
Angle Q V RT U was not found because has a different form (its 
two dimensional projection has a different topology from model 
'ANGLE'; namely, has 5 faces or regions, and 'ANGLE' only 4). 


Angle E F GH was not found because it is partially occulted. 
y (DD SPHERE FIG2) 
(SPHERE 1 IS (S)) 


Some models. 
(DEFPROP ANGLE ((A* (NEIGHBOR B*) (SHAPE FUNNY)) 
(B* (NEIGHBOR A*) (NEIGHBOR C*) (NEIGHBOR D*) (SHAPE ELE)) 
(C* (NEIGHBOR B*) (NEIGHBOR D*) (SHAPE PARALLELOGRAM) ) 
(D* (NEIGHBOR B*) (NEIGHBOR C*) (SHAPE PARALLELOGRAM)) ) 
REGIONS) 
This is the model for ANGLE. See figure in next page. Angle was used in PIe2. 


“Tee gee hen stage eed Snatch AR a AR AD Cy ARR Eat. I A TREC HIS 


citar er Sree 
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Fig. 'PYRAM1". A model. Fig. 'ANGLE'. A model. 


(DEFPROP PYRAM] ((A* (NEIGHBOR B*) (NEIGHBOR C*) (SHAPE TRIANGLE) ) 
(B* (NEIGHBOR A*) (NEIGHBOR C*) (SHAPE TRIANGLE) ) 
(C¥ (NEIGHBOR B¥) (SHAPE TRIANGLE) ) ) REGIONS) 


(DEFPROP SPHERE 


((A* (NEIGHBOR ==) (SHAPE CIRCLE))) 
REGIONS) 


Fig. 'SPHERE'. A model. 
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CHAPTER VII. MODELS 


A model is a written representation of an object that we want to identify. Models 
are mainly used for recognition of the object they represent; they are similar to 
patterns in CONVERT. Generally, a model can represent a large class of objects. 


We have already talked about models in TD (see notation FDL-1) and in DT; 


the purpose of this chapter is to discuss them more systematically. 
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2-dim representation of 3-dim models. 


2-dim models are capable of representing either two or three dimensional 
objects. This is possible since, in analyzing a scene or a picture, we may 
considere a 3+dim object as a 2<dim portion of the picture formed by several 
2-dim regions (surfaces). In describing the model, the inter-connection of 
the vertices of the object is given, plus additional properties or constraints 
between different features (points, corners). 

We will simultaneously talk about d 
two types of such a representation; in 
one of them (see fig, 'PARAL'), a whole § 
3-dim object is described by the struc © 
ture of its edges, as used in ID and 


Polybrick (chapters 5 and 4)[15, 13], b 
ow 
and is called edge-representation or Fig. 'PARAL', Representation 


of a parallelepiped as a 3-dim 


notation, The other type uses regions model. 


as building blocks of models; it is called the region-representation or 
format, and is the one used by DI (chapter 6) [16] and some of the vision 


group programs [22, 33, 38]. 


Models written in edge-notation.- We give as example the parallelepiped of 
figure 'PARAL', which may be represented (written) as 

(a (b g f) b (c a) c (d g b) d (ec) e (£ g d) £ (e a) g (ce e a) ) 

plus the additional properties 

(slope b a ml) (length b a L1) 

(slope c g ml) (length c g L1) 

{slope de ml) (length de 11) 

(slope b c m2) (length bc 12) 


(1)See the FDL-1 language in chapter 5. 
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(slope a g m2) (length a g 12) 
(slope f e m2) (length £ e L2) 
(slope c d m3) (length c d L3) 
(slope g e m3) (length g e L3) 
(slope a £ m3) (length a £ L3) 
Plus the additional [pseudo]property 
(variables ml m2 m3 Ll L2 L3) 
which indicates (see fig. 'PARAL' again) that the symbols ml, ..., L3 are 
dummy variables that may have any value, the only restriction being that 
this value be the same for each occurrence of the symbol. Variables which 
behave in this form are called bound variables in logic, and UAR (undefined 
variables) in CONVERT. Under this convention, we see that 
(slope b a ml) 
(slope c g ml) 
(slope de ml) 
means three parallel lines. 
Properties such as (slope b a ml) are in general function=predicates 
which have as arguments vertices and undefined variables; the user may 


define arbitrary (LISP) properties, which represent constraints on the 


figure or object that the model has to match. 


Models written in region-notation.- Surfaces (faces) are given names, 


and the neighborhood relation between them is indicated; in addition, 
each region has a description of its shape, pretty much in edge notation. 
Fig. 'PARALE' (page ) is described in this way. 

Vertices are treated as being two-dimensional, that is, the coordinates 
with respect to the (frame of the) picture or scene are used; coordinates 


are then those of the projection over the plane of the drawing; all the 
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points of a model are coplanar, so z-coerdinate is not indicated. 


Multiple models for the same obiect.- A three dimensional object will, 
in general, have more than one 2-dim representation; for instance, the 
body with an L shape ehown in figure 'SLE' will have three. or four (57) 


models, according to the position from where you are seeing it. 


Fig. 'ELE'. A three dimensional object has four or 
more different representations as a model, if the 
model contains only 2-dia informatidn about the 
(relative) position of the vertices. 


Our goal in this chapter is, using the representations for models that 
we just described, to develop a notation in which models will be easily 
expressed; to investigate different conventions regarding to the model 
attached to a given object; that is, given an object, how will we write 
its model? 

It would be nice if we could express in the same notation both the model 
and the figure or scene -~-as done in FDL-1 --. 

The main use of the model will be in the recognition and identification 
of objects in a scene. 

I will present now several approaches, which I call First Approach, 
Second Approach, etc., to this problem. Some of them have been programmed, 
tested, thought, etc.(see chapters 4, 5, and 6), and this is so indicated 


when they are. 


irst_Approach; Multirepresentation. 


Since, in general (we hope) the different models of the same object will 
be just a few (less than 5? Certainly, less than a dozen), we could define 
a complex (compound) model compossed by the OR of several simple ones, 


such as 


ne-((~ (go~ Ts) 


That is: we accept this multiplicity of models and try to get used to it. 
The program DI works in this way, using 2-dim models of 3-dim objects; the 
description of the models is made in terms of regions, instead of lines. 
In section of chapter 6 we see an example of this kind of recognition- 


identification (pages 77-78): the way DI finds pyramids, in the figure ‘FIG2' 
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The program TD also works in this way (cf. chapter 5), using 2-dim 
models of 3-dim objects; the descrpition of the model is made in terms of 
lines --instead of regions-- pretty much the way we have been describin 
models in this section. See the way TD recognizes 'Xs' in figure 'EQUIS' 
(page 61). Note also that, if we do not define the complete set of different 
models of an object, we run the risk to fail to recognize the object (see 
figure d24, page 63) when is found in the scene in some positions. 

Conjecture: If for each object we write all its possible models we may 
run out of storage. May be not, may be fairly simple objects may be repre- 


sented by just a few 2-dim models. 


Order in the models.- When our model is compound, that is, when we have 

‘MOD = (OR MODI MOD2 ... MODm) 

then the recognition is done (we are talking of programs DI and TD) from 
right to left: we find all the instances of model MODm in the scene, and 
erase them; then all instances of model MOD i? etc. In this way, rare 

representations of the object could be included in the OR list, to be 


used only when more usual models have failed. 


Second Approach: Two-dimensional Patterns.- 


This could also be considered as an extension to the first approach. 
Properties, defined by the user, may be very complicated functions of the 
(coerdinates of the) vertices [cf. FDL-1, chapter 5], value of slopes, 
distances, etc. Nevertheless, these properties are attached to a mesh of 


connections specifying the seen edges, which is topollogically 
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invariant‘), We would like to be able to have ways to specify variations, 
modifications and additions to this network, in a rich and systematic way. 
Using solution 1 {first approach) is not enough: suppose I want to define 
a. STAR as an object having an arbitrary (bigger than 3) number of equal 


"peaks", equally distributed, as in fig. 'STARS'. 


STAR3 STAR4 STAR5 STAR6 STARLL 
Fig. 'STARS'. Different objects 


which could fall under the same 
generalized model, 


We do not want to say STAR = (STAR or STAR2 or STAR3 or ... ) 
We would like to say 
STAR = (PEAK PEAK PEAK STARS) 


where STARS = (~% or (PEAK STARS) ) 


~% is the null string. 


More pictorically, and more informally too, 


STAR #= (og A (PEAK STARS) 


How good could this approach get? 
If we were going to specify patterns for a lineal string (or for an S- 


expression), this would be the approach we would take. Observe how easy is 


Qs, is interesting to note here that Evans [8] used essentially this kind 


of representation, but he used few attached properties. 
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to say in CONVERT 


( 
(X === X), with X as UAR variable, 


= ==) for any list containing exactly three elements; 


I 
| 


for any list with two or more elements, the first equal to 
the last, but otherwise totally arbitrary; 
(EVEN) PAT ((*OR* () (== == EVEN)): this definition .of the fragment 
EVEN makes possible for (EVEN) to stand for a list with an 
even number of elements, but arbitrary otherwise. 
Probdily an extension of this notation will allow us to specify two-dimensional 
patterns, in a CONVERT-like manner, which will then be used for matching, 
i. e., fer recognition of objects in a scene. In order to achieve this, 
we have to specify 


--- the primitive constituents (primitive patterns) of 


syntax our notational language. 


~~-- the ways new patterns are formed from patterns. 
semantics --- the way the matching or identification is carried 
out; that is, what a 2-dim pattern stands for. 
Non-trivial problems to solve are also: 
--- to find a good written representation of the patterns. 
--- the internal (machine) representation of the patterns. 
--- the interpreter (recognizer) for such 2-dim patterns; i. e.; 
the algorithm which the machine will use in order to carry 
the match or comparison, expressed in a meaningful language(*). 
Incidentally, in this last point questions of efficiency in time (speed 


of execution), efficiency in space (size of program + magnitude of inter- 


mediate wet ©)? + extent of data), efficiency in use (easiness of writing, 


* 
¢ ehat is, in a language which the machine is able to understand/execute. 


()) pieade used by Tobey [35]. 
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understanding, modifying and debugging a program, model or pattern), etc., 
have to be considered --the so-called implementation details-- . 
For the moment, through the remaining of this chapter, we will not worry 
about implementation and we will use for written representation of the 
patterns a mixture of line drawing and atomic symbols, as we already did 
with STAR (cf. page 86). 

Let me point out briefly the syntax and semantics of CONVERT-patterns, 
that is, linear --unidimensional-- strings of symbols, and then I will do 


the same for 2-dim. 


UNIDIMENSTIONAL (CONVERT-type) PATTERNS. 


Terminal Patterns. () stands for () 
== stands for or matches any S~expression. 
=ATO= matches any atom. 


A some other atom, if it does not appear with a 
definition in the dictionary, stands for itself; 
it will match only with an identical atom. 


=== matches with any fragment such that the remainder — 
of the pattern finds an acceptable match with the 
remaining of the expression under comparisson. 


There is a way to define boolean combination of patterns. 


Definitions can be done in several ways in the dictionary, and we may define 
a single atom to represent a whole pattern, this last being either an S- 
expression or a fragment. 


Recursive definitions are possible. 


Concatenation: the pattern (Pl P2 ... Pm) where P, are patterns, stands 


L 


for a list of m elements (El E2 ... Em) such that each element Ey is 


represented (is matched) by the corresponding pattern Pie 
A way exists to isolate subparts of a pattern and to have them available 
for future analysis or for other purposes. 


URI EE EES Rye ET RE PER 
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2-dimensional Models.- TD and Polybrick use models where the connection 
matrix is in terms of the vertices, as ilustrated in fig. 'PARAL'; that 
is, atoms represent points. The notation called FDL-1 (chapter 5) is 


also developed with this convention. 


A model as used by TD is a list of the form 
( connectionlist 'WHERE' properties ) 
where connectionlist is a list of points and neighbors (example refers to 
figure 'PARALLELOGRAM'): (A (B D) B (AC) C (BD) D (AC) ) 


properties is a list of properties: 


((slope b c ml) (slope a dml) (slope b a m2) (slope c d m2) (variables mi m2)) 


DI and the summer-vision group programs [12, 16, 22, 27, 33, 37, 38], on 
ithe other hand, are using regions (faces) as elementary constituents among 
‘which the relations of neighboorhood are specified; in models for DI, 
atoms represent regions. For instance, the same figure 'PARAL', a parallele- 
piped, is described as 

( (A* (NEIGHBOR B*) 
(NEIGHBOR C*) 
(SHAPE PARALLELOGRAM) ) 
(B* (NEIGHBOR C*) 
(NEIGHBOR A*) 
(SHAPE PARALLELOGRAM) ) 
(c* (NEIGHBOR B*) 
(NEIGHBOR A*) 
(SHAPE PARALLELOGRAM)) ) 


As we see, 'SHPE' indicates the shape of the region; in this case, the 
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6 
Fig. 'PARALLELOGRAM'. 

A model in edge-notation is a list of three elements: 

the connectionlist, the conjunction 'WHERE' and a list 

of properties. See FDL-1 in chapter 5. 
shape is an atom, 'PARALLELOGRAM'; in general, it will be a list of points 
and segments; undefined variables are local for each mdel, but not for 
each region: if two regions of the same model mention the same 
undefined variable, this atom will in fact represent the same 
quantity, but if the same variable is mentioned in two models, 
no relation holds between them. In this way, slopes, lengths, 


etc., are transmitted 


between regions. 


Fig. 'PARALE'. 
Model of a parallelepiped. 
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Fig. 'WEDGES', A scene. 
A\ The regions A and F have to 
be fused together and the 


result to match a region-model 
with shape ‘parallelogram’. 


Differences and Similarities between the two represantatiom of models.-~- 


Both representations use symbolic descriptions of an object, suitable 
for comparison or recognition (matching); the edge-representation (as in 
TD) is easier to understand and contains less redundant information; the 
representation by regions (as in DI) is more cumbersome to read; it has 
more repetitions of information, 

There is a good advantage in using the representation by regions of 
a model: the comparison is made using bigger "elementary units", so the 
resulting program is less complicated (compare the sizes of TD and DTI); 


I also believe the match is done faster, because the tree has less branches. 


Fig. 'WEDGES..."' 

False segments are found in 
regions A and F; they are 
the dotted ones. 


Other advantage of the region-representation seems to be evident when 
dealing with two distinct regions that are really one; for instance, in 
figure 'WEDGES' we have to realize that regions A and F are really 

(1) 


continuation one of the other, and that the union 


(1) 


of both will form 


Not really union, because one has to assume the hidden part... 
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a ‘gregion (general region) that will match a region-model having the shape 
'parallelogram'; this being the case, if the scene is represented in terms 
of regions, it is easier to identify regions A and F as ‘mergeable' and to 
construct from them the gregion AF. Of course, this is also possible, 
but cumbersome, when oor 'WEDGES' is represented in FDL-1 format, that 


is, by edges. 


Another advantage in using the region-representation of a scene is 
that it allows one to talk, for each region, of 'spurious' boundaries, that 
is, boundaries that do not actually belong physically to the region, but 
are the result of superpositions. DOTS is the name of a program that 
analyzes each region and tries to determine, using the information about 
T-joints (temminology explained in chapter 8), which boundaries are 'false' 
and marks them with ‘dots’. 

A given segment may be 'false' with respect to one region, but 'true' 
with respect to the neighbor region; this is a property of a pair 
region-segment. 


For instance, DOTS converts figure WEDGES ‘into figure 'WEDGES...' 


TWO-DIMENSIONAL PATTERNS (CONVERT~-type models). 


A terminal patter is a model with no special marks, with or without properties. 
(In this section, we generally refer to the edge-notation, as TD uses it, 

but our remarks apply also to any other representation of a model; we will 

use a mixture of line drawings and atomic symbols as written representation 

of models). 


Up to this point, all the models have been of the type ‘terminal patterns’, 


with the exception of the 


(OR MOD] MOD2 ...) model. 


(OR MOD! MOD2 ... MODn). This 
pattern will match with a figure 
if this figure matches one of 


the models MOD,; the first 


1? 
(rightmost) that matches is 


accepted, no more are tried. 
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Fig. 'LEG'. A semi-model. The 

» A® and B* here, 
have additional lines or edges 
that connect the seni~figure 
that this semi-model represents, 
to a bigger figure. 


Semi~models.- With this name are designated patterns that are terminal 


patterns, except that they are j 


the tying points. In fig. ‘LEG’ 


oined to a bigger figure by some points, 


> A® and BY are tying points, 


A tying point (like BY in fig. 'LEG') must have as neighbor, in 


addition to the specified 'normal' points (C* and F*), a point L*, 


nothing of which is known, K* is also a'missing neighbor’ of the tying 


point AY, 


5 Fig. ‘TABLE’. 
It has five 'LEGS', 


BD is capable of handling semi-models; for instance, when we look for LEGS 


in scene ‘TABLE’, five legs are 


found: Us; Bs Cs Ds (RONCAS)- 


a bi at Ce 
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The way to specify a semi-model in TD notation is simply to avoid talking, 
in the connection-matrix, about the points K* and L*¥ : 
(A® (K* Ck De) Bk (Ck L* Fk) Ck (A® BX EX) Dk (EX A®) E* (D* Fe C¥) 
Fe (E* B*)) 

Note that we mention that A* has K* as neighbor, but we do not say which 
are the neighbors of K*. 

Properties may use the coordinates of K* and L* (the "missing points") ; 
for instance, we could ask for the same slope between lines K* - A* and 


Ak ~- C® (see again fig. 'LEG'). 


Union or Concatenation of Patterns.- New patterns are formed from old ones 
by soldering together some of their points; see =[IE= statement in FDL-1. 
The pattern has the form 

(“TIE= PATL Nl PAT2 N2~ ... PATm Nm ( union! wnion2 ... unionk ) ) 
For instance, the following figure may be described as 


(=TIE= (A (BC) B (AC) C (A B)) (1) 
(E (D F) D (E F) F (E D)) (2) 


( (€ 1 TO E 2) 
(B 1 TO D2))) 


c € F 
a io 
A 8B D a 8 


This feature is not implemented in TD, Originally we proposed to do it by 
forming a new terminal pattern which would be equivalent to (=TIE='...) but 
simpler than it. 


The former pattern would be converted (by TD, at some stage) into 
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(A (BC) B (A C F) F (C B) C (A BF)) plus properties. Since we can tie 
a figure to itself several times, renaming of the vertices has to be done; 
we use here the GENSYM capabilities of LISP and CONVERT. 


Comment: It looks like this way of concatenation is easy but messy. 


A different representation.- Daniel Conrad ‘)) is interested in the genera- 


tion of recursive figures. A line is represented as a series of n_ points 


that are xr units apart lying in the direction d_, or (n, r, d). 


A figure is a dictionary of cycle 2 of the form 
(N V1 Fl V2 F2 V3 F3 ... Vn Fn) 

Vl is the first line in the figure. Since each line or edge of a figure 
should end at a vertex, it may be referred to as a vertex. 

Fl is a figure built on the vertex Bl. It in turn has the same form 
(N' V1' F1l' v2' F2" ... Vm' Fm') as the larger figure of which Fl is a 
subfigure. 

v2 is the next vertex. It will always have its tail on the tip of the 
previous vertex. 

F2 is the subfigure of V2 ... 


These figures are plotted in the printer. 


A different representation.- William Martin [25] also displays figures, 
this time in the scope of the PDP-6. 


Several others [19, 29, 34] have symbolic representations for the 


() planar figures and LISP functions to manipulate them are described in [6]; 


the use of CONVERT to construct these figures is explained in [7]. 


OTP ae eee Gree, LS oe RT 
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purpose of constructing figures. I will not discuss their work here. 


So far we have seen two approaches to the use of models for identifi- 
cation. A third will be presented now. 


Third Approach: 3-dim transparent models (edge-reprasentation).- These 
models can be considered as a 3-dim wired structere, wires corresponding 
to edges, plus properties establishing restrictions between vertices, 
slopes, lengths, etc. For instance, a tetrahedron will be modelled as 
follows (see fig. 'TETRAHEDRON'): 
e ((A (BCD) B (ACD) C (ABD) D (AB C)) where 
(Clength a b-nl) (length ac nl) (length a d nil) 


\Ss c (length b c nl) (Length b dal) (length c d nl) 


(variables nl)) ) 
A \f The vertices of these models have 3 coordinates; 
D properties now refer to coplansrity, etc. 
Fig. ‘SSTRAHEDRON'. We only need one of these 3-dim models to deacribe 


This is a three- di 
mensional model; it completely an object; the problem is how to compare 
is seen here resting 
on a table. this 3-dim model against a 2-dim scene, 
A possible way, which we began to think about for a while, was to direct 
the machine to use information of which lines may possibly be occulted, g 
given that certain others are already seen. 

That is, the model should contain enough information, or the program 
should be written in such a way, that after having identified some lines, 
it would be possible to predict or know which lines of the 3-dimensional 


model are necessarily hidden, so as not to look for them. 


Bee nts hr natepmreniiticoney ERR BREE ERE hoe GILLS cbt ob Anti bose: NNR RN A EAL RR RUS C288 
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For instance, suppose we are looking for 'Fs" (see fig. 'F') in a scene, 
and we have already found lines K-A, A-B, B-G, B-C, C-D; then the program 
would recognize that line E-F would be occluded, and would not try to 
find it. 

The trouble with this approach seems to be the sophistication of the 
program necessary in order to “predict” the lines which are going to be 
occulted and the lines that are required to be present in the scene. This 
situation could be somewhat alleviated if the user supplies --as part of 
the model-- for each point of view the list of visible and invisible Lines 
(or regions) from that position. Instead of a true-false dichotomy for 
visibility, we could have several cathegories: visible - partially 


visible -- invisible - ; possibly others, e. g., all - or - none. 


Fig. 'F '. A transparent 3-dim model. 
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3-dim models projected into 2-dim patterns.- In the last parragraphs we 


have explained the idea of ‘a 3-dim model with associated lists, one for 
each direction of view, these lists containing information about the regions 
or lines visible from that particular line of sight, and the lines which 

the object itself makes invisible from that direction of view. 

It could be possible for a program to produce itself this lists, that 
is, when comparing the scene with the 3-dim model, to use the results of 
this comparisson in order to get the best "line of sight";best in the sense 
that, if we see from that position our 3-dim model, we will obtain a 2-dim 
projection that would closely resemble that part of the scene under compa~ 
rison, if the model and the object. are really the same. The scene will 
drive the construction. In this way, we are producing the 2-dim model as 


(1) 


meeting the requirements of the scene >» but at the same time the model 
we should produce has to be a projection of the 3-dim model of the object 
we want to identify, so recognition is achieved. 

Perhaps the main difficulty in this approach is the fact that very 
little is known about symbolic projections, Also the amount of computation 


might be large. 


Numerical Models.- Roberts [29] uses 3-dim models; these are numerical in 
nature, and are represented by lists of tied blocks connected in rings. See 
the Coral language [34] for this ring structure. 


Each model block is tied to lists of its points, lines and surfaces. 


(hen we eventually finish the construction of the 2-dim model, matching 
against the corresponding part in the scene will be easy, and could be 
reduced to a simple check-up, since we tailor the model to produce (some 
of) the regions in the scenee. 


= 99 - 


Curved Objects.- Objects containing curve edges are represented by the 
same kind of models we described for rectilinear objects. We have now 
more than one kind of segments joining two points, and some notation 
must be used fer them) , 

When the surfaces adopt sophisticated curvatures and inflexions, the 
kinds of models we have described will be inexact. There are major conceptual 
problems to be faced if we are to find really good models for intricately- 
curved surfaces. We can perhaps take a gloomy comfort in the fact that 


humans are very weak (much weaker than they think!) in their mental 


ability to deal with such things. 


), straightforward representation of curve line segments is given by 
White [38]. 
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CHAPTER VIII. DISCUSSION OF SOME SCHEMES FOR RECOGNITION. 


The following subject is treated in this chapter: assume that a 
preprocessor (see chapter 3) has transformed a scene into a line 
drawing or a set of regions, and that a symbolic description of them is 
available. Independently or otherwise‘), the computer has also in its 
memory a collection of lists or patterns called models (see chapter 7), 
which define objects or classes of objects we want to find or recognize. 
We discuss here some algorithms which, using as data the symbolic descrip 
tion of the scene and the models of the objects we are interested in, goes 
ahead and finds them. Chapter 2 talks about some of the problems that we 
expect to meet. . 

Some of these algorithms or variations have already been put into 


practice; see chapters on TD, DI and Polybrick. 


por instance, learning is discussed in Cyclops-2 [3]. 
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The One-to-one matching scheme. 


Under this schema, identification of a given object by means of a model is 
done only if all the features present in the model are also present in the 
object --in the scene--; that is, partially occulted bodies are not iden- 
tified, unless the model in question specifically has a don't-care 
conditions the face or lines missing in the scene. 

DI effectuates mainly this kind of matching or recognition. It has 
to care essentially for finding the right re gions, having the required 
neighbors, erasing the identified bodies, and repeating again. 

TD is more sophisticated, being able to identify overlapping trans- 
parent objects, The proper vertices are searched for, and com plicated 
binding-restorations have to be made to account for failures and 
not-yet-defined properties. See chapter 5. 

Polybrick does not effectuate a one-to-one matching. 

Evans' identification program [8] makes first a one-to-one matching 
between two figures, using one of them as a model, but it has provisions 
to abandon this mode (he "weakens" the requirements) if neccessary. In a 
complicated sense it, too, has to ‘account for failures' and it has a set 
of scoring systems to decide which of a number of matching attempts has the 


‘least amount of failure’. 


Implementation.- In DI and TD, the model to be matched one-to-one to the 
scene is converted to a CONVERT pattern, then definitions are added to the 


dictionary, and the pattern is handed to the CONVERT processor, which executes 
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ac. In this way, we avoid the double interpretation which would occur .- 
we keep both model and scene in their original format and use a program 
to scan the model, chose a feature (a region, a line) and search the scene 
for it; then, scan the model more, select another feature and search the 


scene looking for a match for it; scan the model some more, etc. 


LINEAR WEIGHTING 


Weights.- Given a model, we attach coefficients to each of its parts, and 
also assign a number or threshold to the entire model, as it is indicated 


in figure 'WEIGHTS', where the coefficients or weights are assigned to lines. 


<< |, 


Fig. 'WEIGHTS'. Coefficients of 2 are assigned 
to each line of this model; the total sum is 
16. If we set the threshold = 10, then we allow 
three lines to be missing. 


The weight of a given feature represents the relative value of this feature; 
in fig. 'WEIGHTS' all lines have the same value, 2, Therefore, the total 
weight of this model is 16; the threshold value is set to a lower number, 
say 10, The recognizer is instructed to try to match each feature of the 
model with the corresponding feature of the scene, asin one-to-one matching, 
but in addition it accepts matches even if the features do not agree --that 
is, even in a normal failure-- . At the end, we have a match of value v, 


this number being the sum of the weights of the features which did agree. 


- 103 - 


We reject the match if v is smaller than the threshold of the given model. 
Important features have big numbers; the nearer the threshold to the 
total value of the model, the more "strict" we are with our matches. 
This scheme is a majority consensus; to the extent that it works its 
success is due to the fact that random lines will have a low probability 
of being aligned so as to match some model. That is, "if some figure looks 
enough like a cube, it has to be a cube." 
Linear weighting is easy to implement, but it is weak in differentia- 


ting between two slightly different models. 


Sub-weights.- When using the region-notation, an improvement can be made if 
we assign also weights to the segments that form the boundary of a region. 
We will have now two thresholds: one for accepting a face or a region when 
the lines found for it are enough to overweight the threshold for the 


region; another for accepting a collection of regions to form an object. 


Fig. 'LINEAL'. A face is good if 3 out of 4 of its lines 
are seen; a body is good if 3 of 4 of its faces (regions) 
are seen. Under this 75 /o criteria, only the cube behind 
the cone fails to be recognized. 


* 
¢ pad matches are those when the complete line is missing, or it goes in 


the wrong direction. They contribute with a weight of 0 to the total 
value of the match. 
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Note that even disconnected bodies, such as the parallelepiped in figure 
‘coPY' can be identified -- at least one of its parts, and from this will 


not be hard to find the remaining, using a merger like 'FIT' (next section). 


“| 5 


Fig. 'COPY'. Under the 75 °/, criteria (see fig. ‘LINEAL'), 
face A of the parallelepiped matches completely; faces F and B 
match with 3/4 of success, and are accepted. Therefore, 
parallelepiped AB F is found. Under 66 °/> of success, we 
will still find another parallelepiped: G and E are faces of 
3/4 of success, and parallelepiped G E is one with one face 
missing: 2/3. 


But then things start to get complicated: unless our program be quite 
sophisticated, face H is (more or less) according to linear weight, a 
parallelogram, and so are faces C and D, so figure C D H will be taken 


as a parallelepiped also. 


Conclusions.- We can go a considerable distance with simple methods as 
linear weighting; if we decide to use another method, this last one has to 
do better than linear weighting. 

This observation has application not only here; often enough in 
the field of artificial intelligence, people is tempted to choose 


complicated but anthkwopomorphic programs; let them do so, provided that 
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their algorithms work better than simpler ‘machine oriented' methods. 
Perhaps the best thing would be to use these simple schemes as some sort 
of heuristic that guides and complements the more powerful (but more ex- 


pensive in time) tools. 


Despite its simplicity, linear weighting!) has the disadvantage of 
not being easily extendable. To be sure, we could develop more complex 
schemes that use linear weighting?) as the main tool but, in the light of 
this added sophistication, the weights would probably become for us more a 
nuisance than a help. And the cause is clear: evidence in pro or in contra 
does not behave linearly; more interaction among the different facts is 
neccessary in order to arrive to a sound conclusion than the simple 
"majority votation’ impiied by the weights and the threshold. When you 
have easy ways to get information, linear weight will find its way in. 
When we have to use more powerful tools and sophisticated methods to 
wet tact relevant facts, we usually need sharper "combining" tools for 
making partial conclusions, if for no other reason then because these 
better tools are more expensive (time-consuming), so have to be driven with 
care and with a more detailed knowledge of the prevailing situation. 


Nevertheless, linear weighting has important usea, 


Linear weight puts a lower limit of performance which more sophisti- 


cated programs have to excede if they want to be called "good". 


Done of the most succesful users of weights is Samuel [31]. 


44) 
(2), proposed refinement of [31] is done by crises {A new machine-learning 


technique applied to the game of checkers, MAC-M-299 (AI memo 94). March 66]. 
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THE GENERALIZED REGION (GREGION) APPROACH 


Generalities.- This section refers to a more involved approach to the 
recognition problem; we suppose that several models are available, in 
region-notation, and that the regions of the scene to be analyzed have been 
found 'correctly', that is, the symbolic description is an accurate, exact 
specification of it. We want to find in that scene all instances of a given 
object. 

The general procedure is as follows: each region in the scene is 
analyzed and some of the segments or sides of its boundary are marked 
(with dots); these new regions are then classified and merged, and then 
comparison (matching) begins with the model. The model guides the classi- 
fication and merging, so that there is not a clear cut between the matching 
and the merging; during these processes, difficulties may suggest the ina- 
dequacy of the data, so that (1) a new redotting or (2) a new preprocessing * 
of the given region can be performed, this time with indication of what to 
look for and how. 

Matching is done at a high level (I am using CONVERT) and, when some 
feature is not there, like a line, for instance, we call to FIT to extend 
the region, or we try to find a reason for this omission; or, as we said, 
perhaps FIT does not trust any more the data and decides to get new one, etc. 

All this section is devoted to the explanation of what was just said. 

The programs that mark (with dots) the regions and classify them are 
written but undebugged; it is my idea to finish this work and implement the 
approa ch described in this section, or an extension of it; originally, 
this thesis was going to be about analysis of scenes using the algorithm 


(*) Which, in turn, may originate new merging, etc. 
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which I am about to describe but, since the programs are unfinished ... 


Before talking about the gregion approach, we give some 


Definitions. 


BOUNDARY.- (or SHAPE) of a region. Counterclockwise ordered list of 


segment (lines) and vertices that separate the region from others. 


VERTEX.- Informally, point where two or more segments meet. Formally, 
point where the slope of a line is discontinuous, multivaluee (7) or has 
a maximum or minumum --inflection point--. 

Vertices are inflection points, points where two straight segments or 
one curve and one straight meet, or more than two segments encounter each 
other. They do not need to correspond to 3-dim vertices, although some of 


them do, 


Y-JOINT.- Or simply 'Y'. Vertex with three rectilinear segments. (see chap- 
ter 4 on Polybrick for the use of Y's). 


T-JOINE.- Y-joint with exactly two segments having the same slope, More 


generally, vertex with 3 segments, two of which are rectilinear and colineal. 


SEGMENT.- The finite part of a line between two points in the line (usual 


definition). 


SCENE.- Internal representation of visual or graphical data. The information 
that a scene contains is going to be secured by the process of recognition 
or identification of objects. Frequently, I mean by a SCENE a collection of 


data as above, but organized in a symbolic format. 


MODEL.- A representation of an object or body, used mainly for recognition 
purposes, and generally in a non-numeric format (see in chapter 5 the notation 


FDL-1 for models; see chapter 7 and the region-notation for models). 
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A 
Fig. ‘DEFINITIONS’. We identify the following entities: 
SCENE: One, called ‘definitions’. 
REGION: four regions: 1, 2, 3, 4. Region 4 is the background. 
BOUNDARY: each region has one. : 
Boundary of region 3: (point H, segm. HE, point E, segm EF, 
point F, segm FG, point G, segm GH, point H) 
VERTEK: A, B, ..., K, L. 
T-JOINT: I, D. (and, in a more general manner, H, E). 
Y-JOINT: D, I. E js not because EH is curved. 
SEGMENT: Each region has a set of them. 
Segments of region 3: HE, EF, FG, GH. 


REG@ION.- A simple closed curve of a scene or a model. 
DOTTED REGION.~ 4 region processed by 'DOTS' (a program); a region where 


some of its segments are 'false' and therefore, dotted. 


GREGION.~ (generalyzed region). The merging of two or more dotted 
regions or gregions, which belong to the same face of a 3-dim (or 2-dim) 
body but that, due to occlusion, are disconnected in the scene, as done 


by'FIT" (a program) F 


OBJECT.- (or BODY). A mass of matter distinct from other masses 


(usual definition). 


FACE.- Any of the surfaces that bound a geometric solid. (usual 


definition). 
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Marking the boundary of regions .~ Each region of the scene in question 
receives the treatment specified in this paragraph. 

Each one of the vertices of the boundary is analyzed, looking for 
'T' joints ; with respect to a region, T-joints may be of one of three 


classes : Out, In or Passing 


Once identified, the program called DOTTS marks, erases, or as I 
prefer to say, puts dots to chains of segments between OUT-T's and IN-T's; 
for instance, fig 'PATCH' becomes 'PATCH ...'' The doted segments are 'false' 
ones, in the sense that they do not belong to the region, but are occassioned 


by overlapping or occlusion. See also fig. 'WEDGES..." .(page 91 ) 
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Fig. 'PATCH'. A suene composed of four regions 


Fig. 'PATCH..." T- joints are foynd and dots are 
placed in some of the sides of regions 2 and 3 


We erase the 'false segments with the purpose of facilitating the 


comparissons which have to be made. 
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Unreliability of DOTTS.- In some cases, the information obtained 


analyzing the T-joints is not enough to determine completely the ‘false’ 
segments of a given region ; that is, we can not completely put dots to 
all the false segments. In figure 'CARDS', two answers are acceptable, 


which in turn correspond to two possible identifications 6f the scene. 


Cc 


Fig. 'CARDS'. An scene for which the identification 
of ‘false’ boundaries for region C is not unique 


Two different gregions for region C, 
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If we remote the uppermost card, the interpretation 
of the remaining two cards is different depending 
on the gregion for C. 


Probably this problem will not be serious, since we will, in general, 


not be dealing with thin objects. 


Classificatio. of the regions.- So far, we have converted the regions to 
dotted-regions, using the program DOTS. Now, these dotted regions are 
classified in one of the two following forms, the decission depending if 
there are just a few models to identify (a), or if there is a fair amount (b). 
(a) We examine the shapes of the regions of the models, and let us say we 
find m different shapes 81> Spr coor Bye Now, for each dotted-region of 

the scene, we compute a vector (py Po P3 ere Py!» where Py is the proba- 


bility that the region in question has shape s The Pp, are not strictly 


1" 
prebabilities, but have a small range, say, 0, 1, ...,; 5. The idea is to 
make these computationa fast, even if they are somewhat unreliable. 

(b) If the number of models we try to find in the scene is not small, 
instead of (a), it will be better to work with predetermined shapes 

§)> 8,» oces a> and to compute for them the probability vector as before. 


Shapes S$ i should be "standard" ones, such as parallelogram, ellipsoi- 
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dal, long, large, small, rounded, fragmentary (when the region is really 
part of a bigger but disconnected region), mess, etc, Classes do not need 
to be disjoint. 

These computations are done by examining the boundary of each dotted 
region, using fast 'rules of thumb', such as: how long is the longest seg- 
ment compared to the average segment length; ratio of curved to non-curved 
sides; total number of sides; parallel sides, etc. 

I may say that the purpose of this pass is to "become familiar" with 
the data (with the scene); that is, to have a fair idea of how things look, 
where are situated the large regions, etc. 

Perhaps a better idea is to make this classification pass after the 
merging of regions into gregions; when the number of regions in a scene is 
small, use this pass before merging the regions, otherwise merge first 


and then classify. 


Merging regions into gregions.- wo dotted regions having among them cer- 
tain relationships --certain configuration of sides”, as used in 
Polybrick when trying to find the "next'' vertex-- are grouped or merged 
into a more general region, called gregion. Gregions are also joined under 
similar criteria into gregions. For instance, in fig. 'PATCH...", the 
regions 2 and 3 will be merged. This operation is done by the skeleton- 
program 'FIT', and is facilitated by the fact that the 'false' sides or 
segments have been erased; they do not influence this concatenation. 

A question arises: Do we want to make the merging independent of 
the shapes of the regions of the model or not? If we choose independency, 


the algorithm will be simpler to implement, and faster. If merging is 


governed or influenced by the matching algorithm --that is, when we have 
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information about which region is being compared in this moment--, then 

we will have a more powerful merging, but it will be slower and, beyond a 
point, we will be exploring branches of the tree with very low probability 
of success. 

Let me explain it: If two regions more or less fit (they are 'mergeable'), 
except in a portion, the comparator or matcher should be called to see if 
it finds a third region --the missing linkage--; in looking for this 
third region, the matcher will probably call again to the merger, because 
it has a candidate for the third region that "almost" matches, except that ... 
etc. My point is: if two gregions are not mergeable after a few attempts, 
they will never be mergeable. It is like having two somewhat distant pieces 
when we are working in a jigsaw puzzle, and try to find the Link between 
them recursively. May be it will work, if the pieces are not far apart; 
otherwise, other ways to work the puzzle will generally be easier. 
Conclusion: Interaction between the merger and the matcher will be carried 


up to a certain (shallow) depth. 


Under good reasons, 'FIT' (the merger) may not give full credit to 
some segment, that is, it could question its autenticity or fidelity; for 
instance, see fig. 'CYLIN' in chapter 2. The surface function which obtained 
it will be seen, and perhaps the preprocessor will be given a new (hopefully 
improved) function to reanalyze the region; also, special feature-seekers, 


line followers, etc., could be used at this point. 


Comparison: the job of the matcher.- As in DI and TD, the part of the program 
that effectuates the match has two inputs: a model and a scene. This time, 


the scene is composed by gregions, since it has been treated alredy by DOTS 
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and 'FIT', 

M eching is dona at high level (I am using CONVERT), comparing 
feature after feature --that is, segments, the right neighbors, etc.-- 
When some part of a region is missing --say a line--, we (1) call to 
FIT to extend the gregion, or (2) we try to justify the absence of the line; 
for instance, a dotted side means either (a) end of the gregion or (b) 
thisgregion is expandable or mergeable, since a dotted or false segment 
indicates that the gregion is partially occluded; or (c) indicates that 
data is not reliable and FIT will call the preprocessor; or (d) the 


comparator or matcher says "this object is not found in this scene." 


Unlike TD or DI, matching will be done here interpreting the model 
and trying to find in the scene the required features. It looks to me that 
the process is complicated enough, and that the formation, as in TD or DI, 
of a CONVERT pattern from the model in question will be very complicated. 

The comparisson program (and FIT also) will use the information con- 
tained in the probability vector, pretty much in a conventional way: when 
looking for a parallelogram, will analyze first the regions with big chances 
of being parallelogram. We must realize that the probability vectors may 
be wrong, since they contain information that was gathered in a quick manner. 
Other thing that may go wrong is that FIT merged two gregions that should 
not be merged; at some point the program has to realize this, and undo the 
consequences of its mistake. 

Since this is expensive in time, it may pay to have a cautious merger. 


As pointed out before, 'DOTS' may also (rarely) fail. 


li. 


12. 


13. 


14. 
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